Hi, I have a script that will find out duplicate rows in a file...in a file i have 13 millions of records.... out of that not morethan 5% are duplicate....
for finding duplicate i am using following function... while (<FH>) { if (find_duplicates ()) { $dup++ } } # return 1, if record is duplicate #returns 0, if record is not duplicate sub find_duplicates () { $key = substr($_,10,10); if ( exists $keys{$key} ) { $keys{$key}++; return 1; #duplicate row } else { $keys{$key}++; return 0; #not a duplicate } } --------------------------------------------- here i am storing 13 millions into hash... I think that is why i am getting out of memory..... how to avoid this ? Thanx -Madhu __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]