On 8/13/07, Luba Pardo <[EMAIL PROTECTED]> wrote:
> Dear list:
> I wrote a script that takes a list of ids from an input file and store these
> in an array in a pairwise-like manner (if total list is n then the array is
> (2 ^n)-n). I need to extract for each pair of ids a certain value from a
> huge file that contains the pair of ids and the value (format of the file:
> col1 col2  id1 id2  value).
> The script works but it is takes too long, specially because the second file
> is too big (more than 600 MB).
> I would like to increase the speed of the script, but I haven't quite worked
> what is the best way to do it.
> Any tip?
> Thanks in advance,
> L. Pardo
> ps, I am attaching the script
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> http://learn.perl.org/
>
>

Beyond being a mess of poorly indented code that is using C-style
idioms (instead of Perl idioms), your biggest problem is that you are
splitting the values in the same arrays over and over again.   You
should move the splitting of @a3 and @a4 outside of the nested loops
at the end.   Other wastes of time and space include (but are not
limited to) building a file just to read it in again and reading
entire files into memory when all that is done with the array is to
loop over it.

Overall, your description of the problem seems to lend itself to a
hash tied to a dbm file whose keys are the combined ids from the big
file (rebuild the dbm if the big file is newer than the dbm).  Once
you have that your complicated loop that checks to see if the paired
ids are in the big file becomes

for my $pair (@pairs) {
    my $key = "@$pair";
    if ($ids{$key}) {
        print "$ids{$key}\n";
    } else {
        print $not_found "@$pair";
    }
}

The code to build the dbm file would look something like this

my %ids;
tie %ids, DB_File, "bigfile_db";

while (<$bigfile>) {
        my @fields = (split /\s+/)[3,4,5,6];
        #store this line with either configuration of the keys
        $ids{"@fields[0,1]"} = "@fields";
        $ids{"@fields[1,0]"} = "@fields";
}

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to