On Thu, Aug 11, 2011 at 1:10 AM, Uri Guttman <u...@stemsystems.com> wrote:
> >>>>> "KS" == Kevin Spencer <ke...@kevinspencer.org> writes: > > KS> On Wed, Aug 10, 2011 at 4:04 AM, Rob Coops <rco...@gmail.com> wrote: > >> #!/usr/bin/perl > >> > >> use strict; > >> use warnings; > >> use File::Slurp; # A handy module check it out at: > >> http://search.cpan.org/~uri/File-Slurp-9999.19/lib/File/Slurp.pm > > KS> While handy, be aware that you are slurping the entire file into > KS> memory, so just be careful if you're going to be processing huge > KS> files. > > in general i would agree to never slurp in most genetics files which can > be in the many GB sizes and up. the OP says the file has up to 10M > letters which is fine to slurp on any modern machine. > > uri > > -- > Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com-- > ------------ Perl Developer Recruiting and Placement Services > ------------- > ----- Perl Code Review, Architecture, Development, Training, Support > ------- > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > Believe it or not but I actually did count the number of zero's there ;-) I know that bio data tends to be rather large but looking at the size i figured it cannot hurt... though indeed if you are going for something more substantial you will want to use a different method of reading the file that reads the file in bits of 2MB at the time or so. Of course if you are pulling out only characters X to Y and you are certain that there is nothing but normal characters in the file you could simply start reading the file from point X and continue to Y, there is no need to loop over the whole thing 2M characters at a time. But beware that making such assmptions will always lead to failure at some point as there will always be one file that contains something else that you didn't expect. Even if that file does not show up in testing in a few years and after a few hundered thousand files you will at some point run into one. (it is the simple principle of increasing your sample size eventually you will find a outlier in there) Regards, Rob