Re: eliminating duplicate lines in a file

Timothy Kimball Wed, 02 May 2001 10:26:57 -0700


brahmanyam <[EMAIL PROTECTED]> wrote:
: > Hi,
: > Iam reading flat text file of 100000 lines. Each line has got data of
: > maximum 10 characters.
: > I want to eliminate duplicate lines and blank lines out of that file.
: > i.e. something like sort -u in unix.
: 
: Got plenty of memory? =o)
: 
: open IN, $file or die $!;
: my %uniq;
: while(<IN>) {
:   $uniq{$_}++;
: }
: print sort keys %uniq;

If you want to keep the lines in their original order, you have
a couple of options:

* Keep a hash of lines already seen and push lines onto an array
  that are not already in the hash ([EMAIL PROTECTED] mentioned
  this one already)

* Use the Tie::IxHash module from CPAN:

  open IN, $file or die $!;
  tie my %uniq => 'Tie::IxHash';
  while (<IN>) {
    $uniq{$_}++
  }
  print keys %uniq;
  close IN;

  When you tie() a hash to this module, keys() will return the hash keys
  in the order in which they were created.

-- tdk

Re: eliminating duplicate lines in a file

Reply via email to