Hello:
I'm trying to build a routine to split a string in fields by a
specified delimiter. The string format is pretty close to CSV, except that
quoted substrings can appear within an unquoted string, and escaped quotes can
exist within quoted strings, and the delimiter might exist within a quoted
string (like in CSV). Specifically, its to split recipient lists from e-mail
"To:" headers. So for example:
"LastName, FirstName" <address>, "Name" <address>, <address>; FirstName
LastName, address; "First \"nick\" Last" address
The above string should be splitted into:
"LastName, FirstName" <address>
"Name" <address>
<address>
FirstName LastName
address
"First \"nick\" Last" address
All unquoted surrounding whitespace should be removed. I've gotten so far as
this:
# modified from the Perl Cookbook
push(@list, $+)
while $text =~
/\s*("[^\"\\]*(?:\\.[^\"\\]*)*"(?:\s+[^;,]+))\s*[;,]?\s*|\s*([^;,]+)\s*[;,]?\s*|[;,]\s*/g;
push(@list, undef) if (substr($text, -1, 1) =~ /[;,]/);
But since the matches seem to be too greedy, it keeps trailing space before the
delimiters. Can someone offer a better solution?
NOTE: I want it to be as generic as possible as I cannot expect the elements
in the list to follow strict guidelines (there are too many broken programs out
there and too many idiots!)
Thanks!
dZ.
--
Those whom the gods would destroy, they first make mad.
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs