Hi, Joshua, :)
On Thu, 16 Jan 2003, Scott, Joshua wrote:
> I've got a CSV file which I need to process. The format is as follows.
>
> "Smith, John J",1/1/2002,1/15/2002,"Orlando, FL",Florida
> "Doe, John L",1/1/2002,1/15/2002,Los Angeles, California
>
> I've tried splitting it using: @row = split(",",$data);
> The problem is with the fields that contain the commas between the
> quotes. It's splitting the fields at each of these fields as well
> and I'd like to know how to avoid that.
The suggestions for using a module tailored for this purpose are the
way to go. However, as a learning exercise, here's what I came up
with to satisfy your requirements:
#!/usr/bin/perl
use strict;
use warnings;
# Split CSV lines, which may have commands embedded in quoted strings.
my @lines = ( q("Smith, John J",1/1/2002,1/15/2002,"Orlando, FL",Florida),
q("Doe, John L",1/1/2002,1/15/2002,Los Angeles, California) );
my @fields;
my $qs = q("');
my $sep = ",";
use re 'debug';
foreach (@lines) {
# Simple split for strings that don't contain quotes.
if( index( $_, q(") ) == -1 and
index( $_, q(') ) == -1 ) {
push @fields, [ split( ',', $_ ) ];
}
# Regex for others.
print "$_\n";
my @matches;
while( / # EITHER:
([$qs]) # A quote character.
([^$qs]+?) # Followed by a bunch of non-quote chars.
\1 # And ending with the same non-quote char.
| # OR:
$sep? # Optionally the separator character.
([^$sep]+?) # Followed by a bunch of non-separator chars.
(?:$sep|$) # Then the end of the string or the separator char.
/gx ) {
print "\$2 = $2; \$3 = $3\n";
# Throw away $1 - only used to bracket embedded quotes.
push( @matches, $2 || $3 );
}
push @fields, \@matches if @matches;
}
print "@{$_}\n" foreach @fields;
Hope that is enlightening. I'm sure there are better ways of doing
it, but I'm hardly an "expert" myself!
---Jason
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]