Peter Heslin writes:
:On Wed, Aug 30, 2000 at 11:54:29PM -0400, Mark-Jason Dominus wrote:
:> Perhaps Hugo van der Sanden
:> would be willing to discuss this with you in more detail?
:
:I am not acquainted with the gentleman you name.  Please do solicit
:the input of others you know who might be interested.

That's ok, you don't need to be acquainted with me if you don't
want to be. :)

:Simply put, I want variable-length lookbehind.

The difficulty with variable-length lookbehind (let's call it
VLLB) is this: suppose that we want to match "abcdef...xyz" =~
/(?<=x+)y/. In theory, to check the possible /x+/ matches in
the right order [0] we need to check whether there we can match
0 characters at offset 0 (no), then check whether we can match
1 character at offset 0 or 0 characters from offset 1 (no), then
check whether we can match 2 characters from offset 0, or 1 from
1, or 0 from 2 ... and so it goes on. So we'd be trying a complete
subpattern, of arbitrary complexity, against O(n^2) strings.

Now in practice it isn't quite so bad - in the above example we
could expect the optimiser to decide that there is a fixed
substring 'y' at the start of the match, so we'd jump straight
to an offset where we found that, then do only O(n) matches of
the possible substrings preceding it. But that only kicks in
where the optimiser finds something good to work with.

It is also undoubtedly possible to find further optimisations
of this repeated matching: even if the optimiser didn't pick
up on the 'y', we'd know after the first attempt at each 
position that x+ would not match a string of _any_ length at
that position. However, the type of optimisations needed for
this are quite different from the ones we already have, 
primarily because none of the existing regexp features look
for possible matches in this order.

So I think VLLB is a great idea, undeniably possible, but
it will take a lot of work just to get it working to the
point that it is ludicrously slow in almost all cases; getting
the speed up will be a much greater challenge.

Hugo

[0] this assumes that the right thing is to match the /y/ in
the leftmost position that satisfies the complete regexp, and to
match the /x+/ starting from the leftmost position that permits
this. While not entirely sure, I think this is the only definition
consistent with what has gone before.

Reply via email to