Lasse Reichstein wrote:

I would simply apply same logic we have already for the look ahead ... or
you think that would cause problems?

I'm not sure it even makes sense.

ES RegExps are backtracking based, and it makes a difference in which
order alternatives are tried. Greedy matching is defined in terms of
number of repetitions, not length of the match. All of these are
defined in a way that assumes left-to-right matching.

Example:
 Take the RegExp  /(?<((?:aa|aaa)+))b/  where (?< ... ) delimits the
look-behind.
 and try matching it on the string "xaaaaaaaaab".
 Then tell me how many a's are captured by the capturing group, and why :)

The most "intuitive" interpretation would be a reverse implementation
of the normal matching algorithm, i.e., "backwards matching", but that
would likely duplicate the entire RegExp semantics (or parameterize it
by a direction).

Any attempt to use the normal (forward) semantics and then try to find
an earlier point to start it at is likely to be either flawed or
effectively unpredictable to users.

Technically, you're right. They're different. But they can appear exactly the same by implementing lookbehind as a zero-length assertion of (?:lookbehind)$ matched against the lookbehind's left context, starting from the very start of the subject string. Although people thinking about implementation might come to think of some other approach as more intuitive, from my experience every single plain-old-developer unconcerned about implementation thinks of the semantics I just described as intuitive. It is also how every single implementation of lookbehind that I know of actually works.

The reason that all major regex flavors except .NET don't support lookbehind is because it's inefficient to re-search from the very beginning of an arbitrarily long string. That's why they support fixed- or finit-length lookbehind only--if they can determine the maximum distance backward they need to search forward from, they can step back only that many characters. In practice, at least for finite- rather than fixed-length lookbehind, this attempt to avoid far-back searches is kind of silly--e.g., Java lets you use a quantifier like {0,100000} within lookbehind.

The Right-to-Left Mode that powers .NET's lookbehind is pretty neat. It magically follows the plain-old-developer's intuitive expectation while working backword rather than from the start of the string. Unfortunately, how it actually works is fairly mysterious. Although it works fairly reliably, as I previously mentioned it can occasionally be a bit buggy/weird.

And you will probably never achieve that /(<re>)$/ and /(?<(re))$/
always capture the same substring :)

Apart from potential bugs, (<re>)$ and (?<=(<re>))$ capture the same string in every implementation of lookbehind that I know of.

--Steven Levithan


_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to