Tim Johnson wrote:
David, if you don't mind, could you give an explanation of what you're doing here? I'm not sure if everyone is familiar with the method you're using to look ahead.
I will take a crack at this
-----Original Message----- From: david [mailto:[EMAIL PROTECTED] Sent: Thursday, June 12, 2003 5:28 PM To: [EMAIL PROTECTED] Subject: Re: misunderstanding the /g modifier on REGEX
M Z wrote:
hello all -
I am trying to do the following to this data: input: X|Y||||Z||A
desired output: X|Y| | | |Z| |A
simply replacing || with | | whereever it may occur in the string.
This bit of code doesn't seem to do all of the job.
What is wrong with this code?
while (<>) { while($_ =~ /([|])([|])/g) { $_ =~ s/([|])([|])/$1 $2/g; print "$_"; } }
The problems seems that my bit of code doesn't completely catch "all" of the || occurences within a given line. Please help!!!
try:
#!/usr/bin/perl -w
use strict;
(my $string = 'X|Y||||Z||A') =~ s/\|(?=\|)/| /g;
Let's break the regexp to understand it Matching part: /\|(?=\|)/
This matches any '|' that is followed by another '|' without including the second '|' in $&. What this also means is that, after the match the regexp engine will be pointing at the second '|'.
The next time around the second '|' will be in $& and so on. This is how you achieve overlapping matches that the OP wants.
If your regexp were just /||/, this would still match a '|' followed by another '|'. But, after the match the regexp engine will be pointing at the location immediately after the second '|'. This would mean that the second '|' will not be substituted if it is immediately followed by another '|'.
Hopefully this example will clear things up
#!/usr/local/bin/perl use strict; use warnings;
my $str = 'ab';
my @look_ahead = $str =~ /(?=(\w))(?=(\w))/; my @no_look_ahead = $str =~ /(\w)(\w)/;
print "Using look ahead: @look_ahead\n"; print "Without look ahead: @no_look_ahead\n";
The output of the above code will be Using look ahead: a a Without look ahead: a b
I think the second line of output is expected. 'a' is repeated in the first line because
1) The first (?=(\w)) matches the sub-string 'a' and this is stored in index 0.
2) Since the sub-string 'a' is not included in $&, the regexp engine has not moved forward. It is still pointing at 'a'
3) Due to step 2, the second (?=(\w)) will once again match the sub-string 'a'. This is why it is printed again.
How does the /g modifier affect this behaviour Change the code above to add the /g modifier to both the regexes.
Now the output changes to Using look ahead: a a b b Without look ahead: a b
Once again the second line of output is expected. Now to the first line, steps 1-3 mentioned above still hold.
After these 3 steps, the regex engine sees that the /g modifier is set. This causes it to move forward by one location, you would have an infinite loop if it stays put.
After step 3 the regex engine is now pointing at 'b' and the three steps repeats themselves with 'b'.
print "GET: $string\n"; print "EXPECTED: X|Y| | | |Z| |A\n";
__END__
prints:
GET: X|Y| | | |Z| |A EXPECTED: X|Y| | | |Z| |A
david
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]