Jay Savage wrote:
On Mon, Mar 3, 2008 at 5:32 PM, David Newman <[EMAIL PROTECTED]> wrote:
Greetings. I'm looking to compare two contact lists in csv format, and
 then print out "here are the records in in Llist only, in Rlist only,
 and what's in common."

 I should compare only 3 of the 82 fields in each list. There are
 differences in some of the other fields that I should ignore.

 If I read in each csv file as an array, List::Compare does a nice job of
 comparing all 82 fields as a single array element. But I should only
 look at 3 fields, not all 82. (snippet A below)

 I can also use List::Compare plus a split function to strip out just the
 3 fields I'm comparing. However, the resuling arrays then only have
 three fields in each array element. (snippet B below)

 How to compare only selected fields in each list, but then present all
 fields for any matches?

 thanks

 dn



David,

You've gotten some good advice, here, but I have to ask, why reinvent
the wheel? Take a look at DBD::CSV, or at least consider using
Text::CSV to parse the lines for you, instead of relying on split. CSV
can get pretty nasty, especially for name and address data (think Doe,
Jr., John), and those modules are out there.

Thanks all for your comments!

Using one of the CSV modules is a much better way to validate the input of the two contact lists I'm looking to compare.

It's not perfect, though. My input contains some "foreign" characters such as names with accents, tildes, and umlauts.

Both Text::CSV [1] and Tie::Handle::CSV [2] modules return errors when reading this foreign input. Using TextWrangler on the Mac, I've tried saving one CSV input file as Unicode UTF-8 and UTF-16 instead of the default "Western (Mac OS Roman)" but it doesn't help.

The Tie::CSV_File module failed to install from CPAN but I haven't investigated that.

I have 5500 records in one file and about 4500 records in another to compare -- is there some better way that manually deleting "foreign" characters from each file?

thanks again

dn

[1] snippet 1, using Text::CSV. Every line with funny characters is an error.

my $file = 'foo.csv';
my $csv = Text::CSV->new();

open (CSV, "<", $file) or die $!;

while (<CSV>) {
        if ($csv->parse($_)) {
                my @columns = $csv->fields();
                print "$columns[0] $columns[1] $columns[6]\n";
        } else {
                my $err = $csv->error_input;
                print "Failed to parse line: $err";
        }
}
close CSV;

[2] snippet 2, using Tie::Handle::CSV. This code dies on the first instance of a line with a funny character:

my $file = 'foo.csv';
my $fh = Tie::Handle::CSV->new($file, header => 1);

    while (my $csv_line = <$fh>) {
print $csv_line->{'First Name'} . " " . $csv_line->{'Last Name'} . "\n";
    }

close $fh;

thanks again

dn



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to