[EMAIL PROTECTED] wrote:
> Hello:
> I'm trying to build a routine to split a string in fields by a
> specified delimiter. The string format is pretty close to CSV, except that
> quoted substrings can appear within an unquoted string, and escaped quotes
> can exist within quoted strings, and the delimiter might exist within a
> quoted string (like in CSV). Specifically, its to split recipient lists from
> e-mail "To:" headers. So for example:
>
> "LastName, FirstName" <address>, "Name" <address>, <address>; FirstName
> LastName, address; "First \"nick\" Last" address
>
> The above string should be splitted into:
> "LastName, FirstName" <address>
> "Name" <address>
> <address>
> FirstName LastName
> address
> "First \"nick\" Last" address
>
> All unquoted surrounding whitespace should be removed. I've gotten so far as
> this:
>
> # modified from the Perl Cookbook
> push(@list, $+)
> while $text =~
> /\s*("[^\"\\]*(?:\\.[^\"\\]*)*"(?:\s+[^;,]+))\s*[;,]?\s*|\s*([^;,]+)\s*[;,]?\s*|[;,]\s*/g;
> push(@list, undef) if (substr($text, -1, 1) =~ /[;,]/);
>
>
> But since the matches seem to be too greedy, it keeps trailing space before
> the delimiters. Can someone offer a better solution?
>
> NOTE: I want it to be as generic as possible as I cannot expect the elements
> in the list to follow strict guidelines (there are too many broken programs
> out there and too many idiots!)
OK, this isn't going to be a one-liner because it's easier to modify
the data a bit before splitting to ensure you get what you want and
can apply some rules to it :
$_ = q{"LastName, First Name" <address>, "Name" <address>, <address>; FirstName
LastName, address; "First \"nick\" Last" address};
s/\\"/\003/g; # handle embedded \"
s/"([^"]+)"/\001$1\002/g; # handle open/close "
s/(\001[^\002]*),([^\002]*\002)/$1\004$2/g; # handle commas in quotes
"...,..."
my @f = split /[,;]/;
foreach (@f) {
s/\001|\002/"/g; # restore quotes
s/\003/\\"/g; # restore \"s
s/\004/,/g; # restore embedded ,s
print "\t$_\n";
}
__END__
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs