[EMAIL PROTECTED] wrote:

> Hello:
>       I'm trying to build a routine to split a string in fields by a 
> specified delimiter.  The string format is pretty close to CSV, except that 
> quoted substrings can appear within an unquoted string, and escaped quotes 
> can exist within quoted strings, and the delimiter might exist within a 
> quoted string (like in CSV).  Specifically, its to split recipient lists from 
> e-mail "To:" headers.  So for example:
> 
> "LastName, FirstName" <address>, "Name" <address>, <address>; FirstName 
> LastName, address; "First \"nick\" Last" address
> 
> The above string should be splitted into:
>       "LastName, FirstName" <address>
>       "Name" <address>
>       <address>
>       FirstName LastName
>       address
>       "First \"nick\" Last" address
> 
> All unquoted surrounding whitespace should be removed.  I've gotten so far as 
> this:
> 
> # modified from the Perl Cookbook
> push(@list, $+)
>     while $text =~ 
> /\s*("[^\"\\]*(?:\\.[^\"\\]*)*"(?:\s+[^;,]+))\s*[;,]?\s*|\s*([^;,]+)\s*[;,]?\s*|[;,]\s*/g;
> push(@list, undef) if (substr($text, -1, 1) =~ /[;,]/);
> 
> 
> But since the matches seem to be too greedy, it keeps trailing space before 
> the delimiters.  Can someone offer a better solution?
> 
> NOTE:  I want it to be as generic as possible as I cannot expect the elements 
> in the list to follow strict guidelines (there are too many broken programs 
> out there and too many idiots!)

OK, this isn't going to be a one-liner because it's easier to modify
the data a bit before splitting to ensure you get what you want and
can apply some rules to it :

$_ = q{"LastName, First Name" <address>, "Name" <address>, <address>; FirstName 
LastName, address; "First \"nick\" Last" address};

s/\\"/\003/g;                                   # handle embedded \"
s/"([^"]+)"/\001$1\002/g;                       # handle open/close "
s/(\001[^\002]*),([^\002]*\002)/$1\004$2/g;     # handle commas in quotes 
"...,..."
my @f = split /[,;]/;

foreach (@f) {
        s/\001|\002/"/g;        # restore quotes
        s/\003/\\"/g;           # restore \"s
        s/\004/,/g;             # restore embedded ,s

        print "\t$_\n";
}

__END__
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to