tom arnall am Montag, 26. Juni 2006 20:42: [...] > do you have any idea why: > > $_ = " x11x x22x a "; > > $re1 = qr/x.*?\d\dx|a/; > $re2 = qr/($re1\s)?$re1/; > ($_) = /($re2)/; > print $_; > > doesn't produce 'x11x' ? (note btw that if you insert '\n' between the > first two tokens of the target string, the result >does become 'x11x'. note > also that if you drop '|a' from $re1 you also get 'x11x'.)
# Do you mean by this paragraph: #!/usr/bin/perl use strict; use warnings; sub tst { my ($prefix, $s, $re1)[EMAIL PROTECTED]; my $re2 = qr/($re1\s)?$re1/; $s=~/($re2)/ && print "$prefix <$1>\n"; } tst ('1: ', ' x11x x22x a ', qr/x.*?\d\dx|a/); # orig tst ('2: ', " x11x \n x22x a ", qr/x.*?\d\dx|a/); # \n tst ('3: ', ' x11x x22x a ', qr/x.*?\d\dx/); # without |a # produces: 1: <x11x x22x a> 2: <x11x> 3: <x11x> # and you wonder why 1: does not match only 'x11x' ? I try to explain what happens with the matching of 1: - it's not very concise, and I'm *not* sure if it's correct. Please somebody correct me if I'm wrong: > i read this example as follows: > > $re1 = qr/ > x #find an 'x' > .*? #find whatever of whatever length > \d\d #find two digits > x #find an 'x' This finds, in the first $re1 part of the below $re2, 'x11x', using the shortest non greedy interpretation of .*?, > | #or, instead of all the foregoing, > a #find an 'a' so that the above |a alternative has not to be tested anymore. > /x; [[Start $re2]] > $re2 = qr/ > ( > $re1 #find $re1 See comments above: 'x11x' is found, > \s #and whitespace and \s too (one of the two \s between 'x11x' and 'x22x'). > )? #or maybe none of the foregoing Now, we matched 'x11x ', but > $re1 #find for sure $re1 this 2nd $re1 cannot match anything, because the next unmatched char is \s, whereas the 2nd $re1 expects an 'x' (or an 'a'). > #in sum, find $re1 possibly preceded by > $re1+whitespace Not only that: Yes, the first $re1 is optional, and the second is mandatory; the match by the first $re1 so far is not valid, because the second can't match. Now, *another* match variant with the first $re1 is tried. This is possible with matching 'x11x x22x ' (the .*? matching '11x x'). And, the 2nd $re1 can match the left over 'a'. $re2 matches the whole string this way. It seems, with my interpretation, that omitting the ()? would be tried *after* trying all non-null matches with it, although ()? indicates a minimal match, and the 2nd $re1 alone *could* match 'x11x' - but that would not be the maximal possible match with $re2. I'm a bit confused here. Maybe the reason is that the .*? has "precedence" over the ()? containing it? [backtracking goes from the inner to the outer?] > /x; I'm hoping not augmenting the confusion here... including mine... Dani -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>