Tim Johnson wrote:

David, if you don't mind, could you give an explanation of what you're doing
here?  I'm not sure if everyone is familiar with the method you're using to
look ahead.


I will take a crack at this



-----Original Message----- From: david [mailto:[EMAIL PROTECTED] Sent: Thursday, June 12, 2003 5:28 PM To: [EMAIL PROTECTED] Subject: Re: misunderstanding the /g modifier on REGEX


M Z wrote:




hello all -

I am trying to do the following to this data:
input:
X|Y||||Z||A

desired output:
X|Y| | | |Z| |A

simply replacing || with | |
whereever it may occur in the string.

This bit of code doesn't seem to do all of the job.

What is wrong with this code?

while (<>) {
while($_ =~ /([|])([|])/g) {
$_ =~ s/([|])([|])/$1 $2/g;
print "$_";
}
}

The problems seems that my bit of code doesn't
completely catch "all" of the || occurences within a
given line.  Please help!!!




try:


#!/usr/bin/perl -w

use strict;

(my $string = 'X|Y||||Z||A') =~ s/\|(?=\|)/| /g;


Let's break the regexp to understand it Matching part: /\|(?=\|)/

This matches any '|' that is followed by another '|' without including the second '|' in $&. What this also means is that, after the match the regexp engine will be pointing at the second '|'.
The next time around the second '|' will be in $& and so on. This is how you achieve overlapping matches that the OP wants.


If your regexp were just /||/, this would still match a '|' followed by another '|'. But, after the match the regexp engine will be pointing at the location immediately after the second '|'. This would mean that the second '|' will not be substituted if it is immediately followed by another '|'.

Hopefully this example will clear things up

#!/usr/local/bin/perl
use strict;
use warnings;

my $str = 'ab';

my @look_ahead    = $str =~ /(?=(\w))(?=(\w))/;
my @no_look_ahead = $str =~ /(\w)(\w)/;

print "Using look ahead:   @look_ahead\n";
print "Without look ahead: @no_look_ahead\n";

The output of the above code will be
Using look ahead:   a a
Without look ahead: a b

I think the second line of output is expected. 'a' is repeated in the first line because
1) The first (?=(\w)) matches the sub-string 'a' and this is stored in index 0.
2) Since the sub-string 'a' is not included in $&, the regexp engine has not moved forward. It is still pointing at 'a'
3) Due to step 2, the second (?=(\w)) will once again match the sub-string 'a'. This is why it is printed again.


How does the /g modifier affect this behaviour
Change the code above to add the /g modifier to both the regexes.

Now the output changes to
Using look ahead:   a a b b
Without look ahead: a b

Once again the second line of output is expected. Now to the first line, steps 1-3 mentioned above still hold.
After these 3 steps, the regex engine sees that the /g modifier is set. This causes it to move forward by one location, you would have an infinite loop if it stays put.
After step 3 the regex engine is now pointing at 'b' and the three steps repeats themselves with 'b'.



print "GET: $string\n"; print "EXPECTED: X|Y| | | |Z| |A\n";

__END__

prints:

GET:      X|Y| | | |Z| |A
EXPECTED: X|Y| | | |Z| |A

david






-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to