Re: Removing duplicate records

John W. Krahn Wed, 01 Aug 2007 08:06:00 -0700

Mihir Kamdar wrote:

Hi,


Hello,

Need your help with the following:-

I have a csv file having many records.

I want to remove duplicate records. But the record might not be entirely
duplicate. I only have to check if the 2nd, 3rd, 7th and 8th field of a
record is same as the earlier records. If it is same, then remove the
previous or the last entry. I have written something like below to achieve
this.

#!/usr/bin/perl

open(FILE,"</home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1");

my $line;
my %hash;
my @file;
while ($line=readline(FILE))
{
 my @cdr=split (/,/, $line) ;
 $hash{$cdr[2],$cdr[3],$cdr[6],$cdr[7]}="@cdr";  #Add some more cdr key
fields if u want.
}
close FILE ;
open my $f, '>', 'outputsample1' or
     die 'Failed to open outputsample1';
while (($key, $value) = each %hash)
 {

         print $f  $value."\n";

}
 close $f;

But I am not getting the desired result.


You don't need two loops for that, just one:

#!/usr/bin/perl

my $in_file ='/home/user71/RangerDatasource/Customization/TelekomMalaysia/Scripts/Tests/cprogs/files/sample1';


open my $in,  '<', $in_file        or die "Cannot open '$in_file' $!";
open my $out, '>', 'outputsample1' or die "Failed to open outputsample1 $!";

my %hash;

while ( <$in> ) {
    my $key = join ',', ( split /,/ )[ 2, 3, 6, 7 ];
    print $out $_ unless $hash{ $key }++;
    }

close $out;
close $in;

__END__



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Removing duplicate records

Reply via email to