Hi,
  I have a script that will find out duplicate rows
in a file...in a file i have 13 millions of
records....
out of that not morethan 5% are duplicate....

for finding duplicate i am using following function...

while (<FH>) {
    if (find_duplicates ()) {
           $dup++
    } 

}

# return 1, if record is duplicate
#returns 0, if record is not duplicate
sub find_duplicates ()
{
        $key = substr($_,10,10);
        if ( exists $keys{$key} ) {
                $keys{$key}++;
                return 1; #duplicate row
        } else {
                $keys{$key}++;
                return 0;       #not a duplicate
        }
}
---------------------------------------------
here i am storing 13 millions into hash...
I think that is why i am getting out of memory.....

how to avoid this ?

Thanx
-Madhu





__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to