On 2/1/06, Greg London <[EMAIL PROTECTED]> wrote:
> I'm trying to understand how multiple quantifiers in a
> regular expressions get massaged to fit the data.
> in other words.

Read Mastering Regular Expressions.  Really.

> my $string = "was not was not was not the time was not was not was not was 
> not ";
>
> if($string =~ m{(.*)was(.*)the time}) {
>      print "matched\n";
>      print "1 is '$1'\n";
>      print "2 is '$2'\n";
> }
>
> The first (.*) swallows the whole string,
> and fails when it sees "was" in the regular
> expression, so it backs up until it can find
> a "was" in the string, and then it fails when
> it looks for "the time", so it backs up again
> until it finds a "was" followed by "the time".

Exactly.

> I know it "just works", but I'm trying to figure
> out exactly how it behaves under the hood.
> I think if I were to write it in pseudocode,
> it would look something like this:
>
> foreach my $quantifier ( @list_of_quantifiers ) { # ($1 .. $n)
>      while(nomatch) {
>           $quantifier->reducecaptureby1andtryagain;
>      }
>
> But I think there might be other ways of
> implementing it that would be subtly diffferent.
>
> anyone?

That is basically what happens modulo a very long list of
optimizations.  One of the most important of which is that Perl keeps
a memory of what combinations of position in the regular expression
and string it has been to and doesn't do that again (unless it has
done something that could change the match, like capture a substring).

The following code demonstrates that optimization:

  my $string = "foo " x 100;
  $string .= "fo";
  if ($string =~ /^(\s*foo\s*)*$/) {
    print "Matched\n";
  }
  else {
    print "Didn't match\n";
  }

If you run that on a current Perl, it will quickly figure out that it
doesn't match.  On Perl 5.005 it won't figure out that it doesn't
match in the expected lifetime of the planet.  If you translate that
code to virtually any language other than Perl, it will also never
figure out that it isn't a match.

Cheers,
Ben
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to