Hi Barry,

On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik <bbre...@stellarmicro.com> wrote:
> Below is some test code that will be used in a larger program.
>
> In the code below I have a regular expression who's intent is to look
> for  " <1 or more characters> , <1 or more characters> " and replace the
> comma with |. (the white space is just for clarity).
>
> IAC, the regex works, that is, it matches, but it only replaces the
> final match. I have just re-read the camel book section on regexes and
> have tried many variations, but apparently I'm too close to it to see
> what must be a simple answer.
>
> BTW, if you guys think I'm posting too often, please say so.
>
> Barry Brevik
> ============================================
> use strict;
> use warnings;
>
> my $csvLine = qq|  "col , 1"  ,  col___'2' ,  col-3, "col,4"|;
>
> print "before comma substitution: $csvLine\n\n";
>
> $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;
>
> print "after comma substitution.: $csvLine\n\n";
>

Tobias already gave you a solution and
I also think using Text::CSV or Text::CSV_XS is way better for this task
thank plain regexes, For example one day you might encounter
a line that has an embedded " escaped using \.
Then even if your regex worked  earlier this can kill it.
And what if there was an | in the original string?


Nevertheless let me also try to explain the issue that you had
with the regex as this can come up in other situations.

First, I'd probably use plain " instead of \x22 as that will be
probably easier to the reader to know what are you looking for.

Second, the /s has probably no value at the end. That only changes
the behavior of . to also match newlines.If you don't have newlines in
your string (e.g. because you are processing a file line by line)
then the /s has no effect. That makes this expression:

$csvLine =~ s/(".+),(.+")/$1|$2/;

Then, before going on you need to check what does this really match so
I replaced
the above with

 if ($csvLine =~ s/(".+),(.+")/$1|$2/s ){
   print "match: <$1><$2>\n";
 }

and got

match: <"col , 1"  ,  col___'2' ,  col-3, "col><4">

You see, the .+ is greedy, it match from the first " as much as it could.
You'd be better of telling it to match as little as possible by adding
an extra ? after the quantifier.
 if ($csvLine =~ /(".+?),(.+?")/ ){
   print "match: <$1><$2>\n";
 }

prints this:
match: <"col >< 1">

Finally you need to do the substitution globally, so not only once but
as many times
as possible:

 $csvLine =~ s/(".+?),(.+?")/$1|$2/g;

And the output is

after comma substitution.:   "col | 1"  ,  col___'2' ,  col-3, "col|4"


But again, for CSV files that can have embedded, it is better to use
one of the real CSV parsers.

regards
  Gabor

-- 
Gabor Szabo
http://szabgab.com/perl_tutorial.html
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to