[EMAIL PROTECTED] <> wrote:
> On Dec 12, 2005, at 04:40, $Bill Luebkert wrote:
> 
> DZ-Jay wrote:
> 
> But I did specify the rules:
> 1. Split on a specified delimiter (for the moment I'm aiming for
> [,;]), but ideally I would like it to be variable. 
> 2. As opposed to CSV where quoted strings encompass the entire field,
> quoted substrings can exist within the field. 
> 3. The quotes surrounding a substring are part of the field and
> should not be removed. 
> 4. Escaped quotes (\") can exist within quoted substrings.
> 
> Once or more than once ?
> 
> More than once, as the example shows.
> 
> 5. The delimiter can exist within the quoted substrings.
> 
> One or more ?  More than one can complicate.
> 
> More than once.  Its not as complicated as you may think; the first
> match in my regexp catches those by accepting absolutely *anything*
> within quotation marks (not even looking for delimiters there).  
> 
> And this is an example that covers all the rules:
> 
> $foo = qq!"LastName, FirstName" <address>, "Name" <address>  ,
> <address>; FirstName LastName , address; "First \"nick\" Last" 
> address!;
> 
> Are the <>'s literal or do they indicate a generic address and what
> characters can exist in the name and address besides \w chars ? 
> 
> The <>'s are literal and can contain any characters within it.  As I
> mentioned in my first e-mail, I want this to be generic and to adhere
> to the rules I mentioned, without any expectations on what characters
> to find in any particular place.   
> 
> Making a single RE to handle this would take to much effort when you
> have a simpler solution and since it's time consuming, I'll pass for
> now (plus I'm no expert at REs - I'm self taught).  
> 
> I agree, and I am also self-taught and certainly not an expert at
> regexps.  I would like to get a single regexp, if possible, as a
> matter of challenge and learning.  As I mentioned in my first post,
> the regexp I came up with, after modifying the CSV-split example in
> the Perl Cookbook, worked beautifully -- all except for trailing
> whitespace of a field (spaces before the delimiter).  It would have
> been even easier for me to just $_ =~ /\s*$//; each field at the end,
> but I want to learn what I missed, and perhaps improve my work.       

First, could you modify your posting style. It is impossible to
determine from the above who said what.

It can be done with a single regexp, but its an ugly one (see the output
from the script below). Regexps are no different from other problems.
They are easier to manage if you decompose them into simpler ones. Also,
try to reuse the work done by others. For example, the following seems
to do what you ask (note that the quoting operator that you used
(qq!...!) unescapes the embedded quotes).

-----------------------------------------------------
use strict;
use warnings;

use Regexp::Common qw/delimited balanced/;

my $str = q!"LastName, FirstName" <address>, "Name" <address>  ,
<address>; FirstName LastName , address; "First \"nick\" Last" address!;

my $notsepRE = qr{[^,;\s]+};
my $quotedRE = $RE{delimited}{-delim => '"'};
my $wordRE = qr{$quotedRE|$notsepRE};
my $fieldRE = qr{$wordRE(?:\s+$wordRE)*};

print "Regexp: $fieldRE\n\n";

while ($str =~ /($fieldRE)/g) {
    print "[$1]\n";
}
-----------------------------------------------------

On the whole, however, I think that $Bill's suggestion of pre- and
post-processing the data to make it easier to parse may turn out to be a
simpler and easier to understand solution in the end.

HTH

-- 
Brian Raven



=================================
Atos Euronext Market Solutions Disclaimer
=================================
The information contained in this e-mail is confidential and solely for the 
intended addressee(s). Unauthorised reproduction, disclosure, modification, 
and/or distribution of this email may be unlawful.
If you have received this email in error, please notify the sender immediately 
and delete it from your system. The views expressed in this message do not 
necessarily reflect those of Atos Euronext Market Solutions.

L'information contenue dans cet e-mail est confidentielle et uniquement 
destinee a la (aux) personnes a laquelle (auxquelle(s)) elle est adressee. 
Toute copie, publication ou diffusion de cet email est interdite. Si cet e-mail 
vous parvient par erreur, nous vous prions de bien vouloir prevenir 
l'expediteur immediatement et d'effacer le e-mail et annexes jointes de votre 
systeme. Le contenu de ce message electronique ne represente pas necessairement 
la position ou le point de vue d'Atos Euronext Market Solutions.


_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to