Re: comparing some but not all fields in lists

David Newman Sun, 16 Mar 2008 17:14:45 -0700

Jay Savage wrote:

On Mon, Mar 3, 2008 at 5:32 PM, David Newman <[EMAIL PROTECTED]> wrote:

Greetings. I'm looking to compare two contact lists in csv format, and
 then print out "here are the records in in Llist only, in Rlist only,
 and what's in common."


 I should compare only 3 of the 82 fields in each list. There are
 differences in some of the other fields that I should ignore.

 If I read in each csv file as an array, List::Compare does a nice job of
 comparing all 82 fields as a single array element. But I should only
 look at 3 fields, not all 82. (snippet A below)

 I can also use List::Compare plus a split function to strip out just the
 3 fields I'm comparing. However, the resuling arrays then only have
 three fields in each array element. (snippet B below)

 How to compare only selected fields in each list, but then present all
 fields for any matches?

 thanks

 dn


David,

You've gotten some good advice, here, but I have to ask, why reinvent
the wheel? Take a look at DBD::CSV, or at least consider using
Text::CSV to parse the lines for you, instead of relying on split. CSV
can get pretty nasty, especially for name and address data (think Doe,
Jr., John), and those modules are out there.


Thanks all for your comments!

Using one of the CSV modules is a much better way to validate the inputof the two contact lists I'm looking to compare.

It's not perfect, though. My input contains some "foreign" characterssuch as names with accents, tildes, and umlauts.

Both Text::CSV [1] and Tie::Handle::CSV [2] modules return errors whenreading this foreign input. Using TextWrangler on the Mac, I've triedsaving one CSV input file as Unicode UTF-8 and UTF-16 instead of thedefault "Western (Mac OS Roman)" but it doesn't help.

The Tie::CSV_File module failed to install from CPAN but I haven'tinvestigated that.

I have 5500 records in one file and about 4500 records in another tocompare -- is there some better way that manually deleting "foreign"characters from each file?


thanks again

dn

[1] snippet 1, using Text::CSV. Every line with funny characters is anerror.


my $file = 'foo.csv';
my $csv = Text::CSV->new();

open (CSV, "<", $file) or die $!;

while (<CSV>) {
        if ($csv->parse($_)) {
                my @columns = $csv->fields();
                print "$columns[0] $columns[1] $columns[6]\n";
        } else {
                my $err = $csv->error_input;
                print "Failed to parse line: $err";
        }
}
close CSV;

[2] snippet 2, using Tie::Handle::CSV. This code dies on the firstinstance of a line with a funny character:


my $file = 'foo.csv';
my $fh = Tie::Handle::CSV->new($file, header => 1);

    while (my $csv_line = <$fh>) {

print $csv_line->{'First Name'} . " " . $csv_line->{'LastName'} . "\n";

    }

close $fh;

thanks again

dn



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: comparing some but not all fields in lists

Reply via email to