Re: removing duplicates

Rob Dixon Tue, 05 Feb 2008 07:22:52 -0800

boll wrote:

I'm trying to write a script to remove duplicate e-mail addresses from alist.

I'd like some help understanding...
1. Why does it remove all but one of the duplicate lines?
2. How can I fix it?


Thanks for any advice,
John
-------------------------------
#!/usr/bin/perl
use warnings;
use strict;

open ALLNAMES, "emails.txt" or die "File: infile failed to open: $!\n";
my @allnames = <ALLNAMES>;

my %seen = ();
my @unique = grep { ! $seen{ $_ }++ } @allnames;

print "@unique";

close ALLNAMES or die "cannot close infile";
-----------------------------------------
here's a small test file with fourteen lines, but only ten unique lines:

[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]

-------------------------------


I would guess that your output includes the last line of the file when
you don't expect it to. You are retaining the newline character at the
end of each line. If the final line doesn't have a newline at the end it
will appear different from the ones that do, and so will be listed in
the output. To fix this just

  my @allnames = <ALLNAMES>;
  chomp @allnames;

and then

  print "$_\n" foreach @unique;

at the end.

HTH,

Rob



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: removing duplicates

Reply via email to