In a message dated 12/10/2005 7:04:59 P.M. Eastern Standard Time, [EMAIL PROTECTED] writes:
 
> Hello:
>     I'm trying to build a routine to split a string in fields by a specified delimiter.
The string format is pretty close to CSV, except that quoted substrings can appear within an
> unquoted string, and escaped quotes can exist within quoted strings, and the delimiter might
> exist within a quoted string (like in CSV).  Specifically, its to split recipient lists from
> e-mail "To:" headers.  So for example:
>
> "LastName, FirstName" <address>, "Name" <address>, <address>; FirstName LastName, address; "First \"nick\" Last" address
>
> The above string should be splitted into:
>     "LastName, FirstName" <address>
>     "Name" <address>
>     <address>
>     FirstName LastName
>     address
>     "First \"nick\" Last" address
>
> All unquoted surrounding whitespace should be removed.  I've gotten so far as this:
>
> # modified from the Perl Cookbook
> push(@list, $+)
>     while $text =~ /\s*("[^\"\\]*(?:\\.[^\"\\]*)*"(?:\s+[^;,]+))\s*[;,]?\s*|\s*([^;,]+)\s*[;,]?\s*|[;,]\s*/g;
> push(@list, undef) if (substr($text, -1, 1) =~ /[;,]/);
>
>
> But since the matches seem to be too greedy, it keeps trailing space before the delimiters.  Can someone offer a better solution?
>
> NOTE:  I want it to be as generic as possible as I cannot expect the elements in the list to follow strict guidelines (there are too many broken programs out there and too many idiots!)
>
>     Thanks!
>     dZ.
hi dZ -  
 
maybe try something like:  
 
=================== begin code =====================

use warnings;
use strict;
 
my $text = q(    "LastName, FirstName" <address>   ,  "Name" <address>, <add  ress>; FirstName LastName, address   ; "First \"nick\" Last" ad dre ss   );
 
# The above string should be splitted into:
#     "LastName, FirstName" <address>
#     "Name" <address>
#     <add ress>
#     FirstName LastName
#     address
#     "First \"nick\" Last" ad dre ss
 
# All unquoted surrounding whitespace should be removed.  I've gotten so
# far as this:
 
# modified from modification from the Perl Cookbook
 
my @list;
 
# a delimiter:
#   contains a single delimiter character;
#   may have any amount of whitespace before or after delimiter character.
my $delimiters = q(;,);                       # list of delimiter characters
my $delimiter = qr/ \s* [$delimiters] \s* /;  # delimiter sequence
 
# a double-quoted string:
#   includes opening and closing double-quotes;
#   may be empty;
#   may contain any characters, including delimiter characters;
#   may have backslash-escaped characters, including escaped double-quotes.
my $quoted_string = qr/ " [^"\\]* (?: \\. [^"\\]* )* " /x;
 
# an address:
#   must contain at least one character;
#   may not contain any delimiter character;
#   may have embedded whitespace, but may not begin or end with ws.
my $not_delimiter_or_ws = qr/[^$delimiters\s]/;
my $address = qr/ $not_delimiter_or_ws+ (?: \s+ $not_delimiter_or_ws+ )* /x;
 
 
    # while $text =~ / \s*                                    # optional ws
    #                  ( " [^\"\\]* (?: \\. [^\"\\]* )* "     # quoted string...
    #                    (?: \s+ [^;,]+ )                     # then address (unnecessary grouping)
    #                  )
    #                  \s*                                    # optional ws
    #                  [;,]?                                  # optional delimiter
    #                  \s*                                    # optional ws
    #                |                                    # or
    #                  \s*                                    # optional ws
    #                  ( [^;,]+ )                             # address
    #                  \s*                                    # optional ws
    #                  [;,]?                                  # optional delimiter
    #                  \s*                                    # optional ws
    #                |                                    # or
    #                  [;,]                                   # required delimiter
    #                  \s*                                    # optional ws
    #                /xg;
 
    while $text =~ /                            # either
                     \s*                            # optional ws
                     (                              # capture...
                       $quoted_string                   # quoted string
                       \s+                              # required ws
                       $address                         # address
                     )
                     $delimiter?                    # optional delimiter
                   |                            # or
                     \s*                            # optional ws
                     (                              # capture...
                       $address                         # address
                     )
                     $delimiter?                    # optional delimiter
                   |                            # or
                     $delimiter                     # required delimiter
                   /xg;
 
push(@list, undef) if (substr($text, -1, 1) =~ /[;,]/);  # i'm not sure just what this is for
 
{ local $" = "]\n[";  print "[EMAIL PROTECTED] \n"; }
 
================= end code =========================

hth -- bill walters  
 
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to