At 07:57 AM 5/2/2001, you wrote:
>Hi,
>Iam reading flat text file of 100000 lines. Each line has got data of
>maximum 10 characters.
>I want to eliminate duplicate lines and blank lines out of that file.
>i.e. something like sort -u in unix.
>
><snipped>
>
>Is there any easy way of doing it in perl???
>thanks,
I would think the easiest way would be with a hash. You could read in all
the lines as keys to the hash, and then just output the keys. Like this:
%lines = ();
while (<FILE>) {
$lines{$_}++;
}
print OUTFILE keys %lines;
Of course, you would need to open up the files before hand. In this case,
you could even see how many times each line was in the file by printing out
the value of the hash key. (The ++ operator will make a count of the
number of times it sees identical lines.)
If the order you output the lines in is not important, this will work
fine. If the you need them sorted, just throw a sort into the print
statement ( print OUTFILE sort keys %lines; ) and you're all set. If you
need the lines in the order they occurred in the original file, check out
the Tie::IxHash or ArrayHashMonster modules.
http://search.cpan.org/search?dist=Tie-IxHash
http://search.cpan.org/search?dist=ArrayHashMonster
Thank you for your time,
Sean.