Re: puzzling '.{1,4}'

D. Bolliger Mon, 26 Jun 2006 15:35:16 -0700

tom arnall am Montag, 26. Juni 2006 20:42:
[...]
> do you have any idea why:
>
>       $_ = " x11x  x22x a ";
>
>       $re1 = qr/x.*?\d\dx|a/;
>       $re2 = qr/($re1\s)?$re1/;
>       ($_) = /($re2)/;
>       print $_;
>
> doesn't produce 'x11x' ? (note btw that if you insert '\n' between the
> first two tokens of the target string, the result >does become 'x11x'. note
> also that if you drop '|a' from $re1 you also get 'x11x'.)



# Do you mean by this paragraph:

#!/usr/bin/perl
use strict;
use warnings;

sub tst {
  my ($prefix, $s, $re1)[EMAIL PROTECTED];

  my $re2 = qr/($re1\s)?$re1/;

  $s=~/($re2)/ && print "$prefix <$1>\n";
}

tst ('1: ', ' x11x  x22x a ', qr/x.*?\d\dx|a/); # orig
tst ('2: ', " x11x \n x22x a ", qr/x.*?\d\dx|a/); # \n
tst ('3: ', ' x11x  x22x a ', qr/x.*?\d\dx/); # without |a

# produces:

1:  <x11x  x22x a>
2:  <x11x>
3:  <x11x>

# and you wonder why 1: does not match only 'x11x' ?

I try to explain what happens with the matching of 1: - it's not very concise, 
and I'm *not* sure if it's correct. Please somebody correct me if I'm wrong:

> i read this example as follows:
>
>       $re1 = qr/
>       x                               #find an 'x'
>       .*?                             #find whatever of whatever length
>       \d\d                            #find two digits
>       x                               #find an 'x'

This finds, in the first $re1 part of the below $re2, 'x11x', using the 
shortest non greedy interpretation of .*?,

>       |                               #or, instead of all the foregoing,
>       a                               #find an 'a'

so that the above |a alternative has not to be tested anymore.

>       /x;

[[Start $re2]]

>       $re2 = qr/
>       (
>       $re1                            #find $re1

See comments above: 'x11x' is found,

>       \s                              #and whitespace

and \s too (one of the two \s between 'x11x' and 'x22x').

>       )?                              #or maybe none of the foregoing

Now, we matched 'x11x ', but 

>       $re1                            #find for sure $re1

this 2nd $re1 cannot match anything, because the next unmatched char is \s, 
whereas the 2nd $re1 expects an 'x' (or an 'a').

>                                       #in sum, find $re1 possibly preceded by 
> $re1+whitespace

Not only that: Yes, the first $re1 is optional, and the second is mandatory; 
the match by the first $re1 so far is not valid, because the second can't 
match.

Now, *another* match variant with the first $re1 is tried. This is possible 
with matching 'x11x  x22x ' (the .*? matching '11x  x'). And, the 2nd $re1 
can match the left over 'a'. $re2 matches the whole string this way.

It seems, with my interpretation, that omitting the ()? would be tried *after* 
trying all non-null matches with it, although ()? indicates a minimal match, 
and the 2nd $re1 alone *could* match 'x11x' - but that would not be the 
maximal possible match with $re2. 
   I'm a bit confused here. Maybe the reason is that the .*? has "precedence" 
over the ()? containing it? [backtracking goes from the inner to the outer?]

>       /x;

I'm hoping not augmenting the confusion here... including mine...

Dani

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: puzzling '.{1,4}'

Reply via email to