Re: eliminating duplicate lines in a file

Sean O'Leary Wed, 02 May 2001 08:15:59 -0700

At 07:57 AM 5/2/2001, you wrote:
>Hi,
>Iam reading flat text file of 100000 lines. Each line has got data of 
>maximum 10 characters.
>I want to eliminate duplicate lines and blank lines out of that file.
>i.e. something like sort -u in unix.
>
><snipped>
>
>Is there any easy way of doing it in perl???
>thanks,

I would think the easiest way would be with a hash.  You could read in all 
the lines as keys to the hash, and then just output the keys.   Like this:

%lines = ();

while (<FILE>) {
     $lines{$_}++;
}

print OUTFILE keys %lines;

Of course, you would need to open up the files before hand.  In this case, 
you could even see how many times each line was in the file by printing out 
the value of the hash key.  (The ++ operator will make a count of the 
number of times it sees identical lines.)

If the order you output the lines in is not important, this will work 
fine.  If the you need them sorted, just throw a sort into the print 
statement ( print OUTFILE sort keys %lines; ) and you're all set.  If you 
need the lines in the order they occurred in the original file, check out 
the Tie::IxHash or ArrayHashMonster modules.

http://search.cpan.org/search?dist=Tie-IxHash
http://search.cpan.org/search?dist=ArrayHashMonster

Thank you for your time,

Sean.

Re: eliminating duplicate lines in a file

Reply via email to