Title: File Comparison
Ken,
 
Try reading only nut.dat into memory, and the process data.dat one line at a time. If data.dat is huge, you're probably filling up memory and possibly paging a lot while you iterate through it. It makes perfect sense to load nut.dat into memory as you're iterating over the elements many times, but for data.dat you only process each line once anyway, or at least you should. In your example, you actually loop through all the elements in data.dat once for each element, which is compounded by the fact that you're taking a substr from the string each time through (would be better to perform that once).
 
Another thing that should (may?) make a substantial performance difference is to load nut.dat into a hash to make the comparison simpler - a single hash lookup should be much faster than iterating through the entire array, even if nut.dat is relatively smaller. Try this:
 
#!/bin/perl -w
 
use strict;
 
my %nuts; # no jokes here please :)
open FILE1, '< nut.dat' or
    die $!;
while(<FILE1>)
{
    chomp;
    $nuts{$_}=1;
}
close FILE1;
 
open OUT, '>> box.dat' or
    die $!;
open FILE2, '< data.dat' or
    die $!;
while(my $line=<FILE2>)
{
    my($num);
    ($num, undef)=split /\s+/,$line, 2;
    if(defined $nuts{$num})
    {
        print OUT $line;
    }
}
close FILE2;
close OUT;
I threw in a few other tidbits - checking the result of a file open for instance. I'm sure this could be optimized further, but this should be a substantial improvement, and I figured it'd be better if it was reasonably easy to follow.
 
Hope this helps,
 
Barry
 
-----Original Message-----
From: Kenneth Jideofor [ MTN - Ikoyi ] [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 04, 2003 4:18 PM
To: [EMAIL PROTECTED]
Subject: File Comparison


Hi Guys,

I have two files, nut.dat and data.dat.
The first file, nut.dat, contains lines of eleven digit figures; each line in this file is an eleven-digit figure.
The second file, data.dat, contains lines of spaced figures; each line having four groups of eleven-digit figures.
In other words, each line in the file, nut.dat, has one field, while each line in the file, data.dat, has four fields.

I want to compare each field in the nut.dat file against the first field of each line in the data.dat file. Where the field in the nut.dat file matches the first field in any of the lines in the data.dat file, the entire line in the data.dat file is saved to a file, box.dat.

For example, the content of each file can be viewed as follows:

nut.dat file:
12345678912
56789876543
23456789652
34567123456

data.dat file:
12345678912  23456098734  12348907678  67342519806
23456789652  87456321452  45231987564  23675843902
34567123456  23456709819  25361728980  49872653418


I wrote the following Perl script to perform the above task.
The script works very fast for a small data.dat file while it is extremely very slow with a very huge data.dat file.
I need to make the script faster.

Could you, please, assist me with an improved version of the script?

Regards,
Ken

____________ My Perl Script______________________

#!/bin/perl -w
open (FILE1, "/opt/MISC/AUDIT/nut.dat");
open (FILE2, "/opt/MISC/AUDIT/data.dat");
open (OUT, ">>/opt/MISC/AUDIT/box.txt");
@nuts = <FILE1>;
@database = <FILE2>;
close(FILE1);
close(FILE2);

foreach $nut(@nuts) {
        chomp($nut);
foreach $database(@database) {
        chomp(@database);
$data = "">
        if ($data eq $nut) {
                print OUT "$database\n";
                }      
        }
}
close(OUT);



        ******************** DISCLAIMER STATEMENT ********************
This e-mail message is private and confidential with its contents and attachments are the property of MTN Nigeria for the named addressee. It is solely intended for a specific addressee and purpose.  If you are not the addressee (a) you may not disclose, copy, distribute or take any action based on the contents hereof; (b) kindly inform the sender immediately or email - [EMAIL PROTECTED] and destroy all copies hereof.  Any unauthorized use or interception of this email, or the review, retransmission, dissemination or other use of, or taking of any action in reliance upon the contents of this email, by persons or entities other than the intended recipient, is prohibited. Save for communications relating to the official business of MTN Nigeria, MTN Nigeria does not accept any responsibility for the contents of this email or any opinions expressed in this email or its attachments. Due to the nature of email MTN Nigeria cannot ensure and accepts no liability for the integrity of this email and any attachments, nor that they are free of any virus. MTN Nigeria accepts no liability for any loss or damage whether direct or indirect or consequential, however caused, whether by negligence or otherwise, which may result directly or indirectly from this communication or any attached files. This message does not constitute a guarantee or proof of the facts mentioned herein.



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
the latest virus scan software available for the presence of computer
viruses.
**********************************************************************

Reply via email to