Re: Perl script to retrieve specific lines from file

Rob Coops Wed, 10 Aug 2011 23:11:52 -0700

On Thu, Aug 11, 2011 at 1:10 AM, Uri Guttman <u...@stemsystems.com> wrote:

> >>>>> "KS" == Kevin Spencer <ke...@kevinspencer.org> writes:
>
>  KS> On Wed, Aug 10, 2011 at 4:04 AM, Rob Coops <rco...@gmail.com> wrote:
>  >> #!/usr/bin/perl
>  >>
>  >> use strict;
>  >> use warnings;
>  >> use File::Slurp; # A handy module check it out at:
>  >> http://search.cpan.org/~uri/File-Slurp-9999.19/lib/File/Slurp.pm
>
>   KS> While handy, be aware that you are slurping the entire file into
>  KS> memory, so just be careful if you're going to be processing huge
>  KS> files.
>
> in general i would agree to never slurp in most genetics files which can
> be in the many GB sizes and up. the OP says the file has up to 10M
> letters which is fine to slurp on any modern machine.
>
> uri
>
> --
> Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com--
> ------------  Perl Developer Recruiting and Placement Services
>  -------------
> -----  Perl Code Review, Architecture, Development, Training, Support
> -------
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>
Believe it or not but I actually did count the number of zero's there ;-)

I know that bio data tends to be rather large but looking at the size i
figured it cannot hurt... though indeed if you are going for something more
substantial you will want to use a different method of reading the file that
reads the file in bits of 2MB at the time or so. Of course if you are
pulling out only characters X to Y and you are certain that there is nothing
but normal characters in the file you could simply start reading the file
from point X and continue to Y, there is no need to loop over the whole
thing 2M characters at a time. But beware that making such assmptions will
always lead to failure at some point as there will always be one file that
contains something else that you didn't expect. Even if that file does not
show up in testing in a few years and after a few hundered thousand files
you will at some point run into one. (it is the simple principle of
increasing your sample size eventually you will find a outlier in there)

Regards,

Rob

Re: Perl script to retrieve specific lines from file

Reply via email to