Re: large files

Rob Dixon Tue, 05 Mar 2013 13:10:59 -0800

On 05/03/2013 20:41, Chris Stinemetz wrote:

Hello List,


I am working on a script to parse large files, by large I mean 4 million
line+ in length and when splitting on the delimiter ( ; ) there are close
to 300 fields per record, but I am only interested in the first 44.

I have begin testing to see how fast the file can be read in a few
different scenarios:

while( <> ) {
}

It only takes about 6 seconds to read 4,112,220 lines.

But when I introduce split as such:

while (<>) {
     chomp($_);
     my @tokens = split( ";", $_ );
}

It takes around 7 minutes to reach eof.

I also tried using a LIMIT on split as shown below:
It helped greatly by only taking a little over 1 minute but I am curious if
there is a way to still improve the time to read in the file or is this a
reasonable time.

while (<>) {
     chomp($_);
     my @tokens = split( ";", $_, 44 );
}


Hi Chris

My first thought is that that is probably the best you are going to get.
Note, though, that you want a limit of 45 on split to get the first 44
fields, as otherwise the 44th field will have the rest of the line attached.

Actually, assigning only the first 44 fields may speed things up, rather
than copying the trailing 256 fields to the array when you don't need
them. Try

    my @tokens;
    @tokens[0..43] = split /;/, $_, 45;

or perhaps

    my @tokens = (split /;/, $_, 45)[0..43];

HTH,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: large files

Reply via email to