Hi Barry, On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik <bbre...@stellarmicro.com> wrote: > Below is some test code that will be used in a larger program. > > In the code below I have a regular expression who's intent is to look > for " <1 or more characters> , <1 or more characters> " and replace the > comma with |. (the white space is just for clarity). > > IAC, the regex works, that is, it matches, but it only replaces the > final match. I have just re-read the camel book section on regexes and > have tried many variations, but apparently I'm too close to it to see > what must be a simple answer. > > BTW, if you guys think I'm posting too often, please say so. > > Barry Brevik > ============================================ > use strict; > use warnings; > > my $csvLine = qq| "col , 1" , col___'2' , col-3, "col,4"|; > > print "before comma substitution: $csvLine\n\n"; > > $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; > > print "after comma substitution.: $csvLine\n\n"; >
Tobias already gave you a solution and I also think using Text::CSV or Text::CSV_XS is way better for this task thank plain regexes, For example one day you might encounter a line that has an embedded " escaped using \. Then even if your regex worked earlier this can kill it. And what if there was an | in the original string? Nevertheless let me also try to explain the issue that you had with the regex as this can come up in other situations. First, I'd probably use plain " instead of \x22 as that will be probably easier to the reader to know what are you looking for. Second, the /s has probably no value at the end. That only changes the behavior of . to also match newlines.If you don't have newlines in your string (e.g. because you are processing a file line by line) then the /s has no effect. That makes this expression: $csvLine =~ s/(".+),(.+")/$1|$2/; Then, before going on you need to check what does this really match so I replaced the above with if ($csvLine =~ s/(".+),(.+")/$1|$2/s ){ print "match: <$1><$2>\n"; } and got match: <"col , 1" , col___'2' , col-3, "col><4"> You see, the .+ is greedy, it match from the first " as much as it could. You'd be better of telling it to match as little as possible by adding an extra ? after the quantifier. if ($csvLine =~ /(".+?),(.+?")/ ){ print "match: <$1><$2>\n"; } prints this: match: <"col >< 1"> Finally you need to do the substitution globally, so not only once but as many times as possible: $csvLine =~ s/(".+?),(.+?")/$1|$2/g; And the output is after comma substitution.: "col | 1" , col___'2' , col-3, "col|4" But again, for CSV files that can have embedded, it is better to use one of the real CSV parsers. regards Gabor -- Gabor Szabo http://szabgab.com/perl_tutorial.html _______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs