Old thread of interest: https://esdiscuss.org/topic/regexp-lookbehind <https://esdiscuss.org/topic/regexp-lookbehind>
I've also personally thought quite thoroughly on how to add look-behind assertions in ES RegExps. Your approach imposes that the exact length of a string matched by the sub-pattern `( ? < = Disjunction )` is statically determinable This is, indeed, the original semantics of Perl 5. Another more powerful approach (which I prefer), is the one followed by .NET: the sub-pattern `( ? < = Disjunction )` is tentatively matched _backwards_. The advantage is that it does not impose any restriction on the subpattern, while remaining as efficient as possible (in theory, about as efficient as an equivalent look-ahead assertion). The main drawback is that it may be somewhat confusing for some types of complicated subpatterns, especially for those that contain backreferences. (Also it is more mind-bending to spec.) That said, I think there are easier missing features to be added, that are supported by most other RegExp flavours. I'm thinking of: * the `s`, or `.dotall` flag: the dot `.` matches every character, including newlines; * true support of Unicode, namely: escape sequences such as `\Lu` for uppercase letter, or `\X` for grapheme cluster. —Claude > Le 18 mai 2015 à 15:50, Nozomu Katō <[email protected]> a écrit : > > I submit a proposal for adding the look-behind assertions to RegExp. > > ---- > Abstract > > Basically, the evaluation for the look-behind assertion is performed by > "rewind the target sequence and do the same thing as the look-ahead > assertion": > > 1. When compiling a regular expression pattern, count the total number > of PatternCharacter and . and \ AtomEscape and CharacterClass in > Disjunction per look-behind assertion. Each Atom may be followed by > {n}, but may not be followed by the other quantifiers. > 2. When a look-behind assertion evaluated, rewind the target sequence > by that number and evaluate as if the look-ahead assertion from > that point. > > -- > Static Semantics: Early Errors (21.2.1.1 in ECMAScript Specifition 6) > > Assertion :: ( ? < = Disjunction ) > Assertion :: ( ? < ! Disjunction ) > > It is a Syntax Error if Disjunction contains Quantifier :: > QuantifierPrefix except QuantifierPrefix :: { DecimalDigits }. > > When Disjunction is separated by the | regular expression, if the > number of the sequence of code points which the left Alternative > matches and the number of the sequence of code points which the right > Disjunction matches are different, then it is a Syntax Error. > > > Assertion (21.2.2.6 in ECMAScript Specifition 6) > > The production Assertion :: ( ? < = Disjunction ) evaluates as follows: > > 1. Evaluate Disjunction to obtain a Matcher m. > 2. Return an internal Matcher closure that takes two arguments, a State > x and a Continuation c, and performs the following steps: > 1. Let n be the exact number of the sequence of code points which > Disjunction matches [NOTE]. > 2. Let xe be x's endIndex. > 3. If xe < n, return failure. > 4. Let xcap be x's captures List. > 5. Let y be the State (xe-n, xcap). > 6. Let d be a Continuation that always returns its State argument as a > successful MatchResult. > 7. Call m(y, d) and let r be its result. > 8. If r is failure, return failure. > 9. Let z be r's State. > 10. Let zcap be z's captures List. > 11. Let a be the State (xe, zcap). > 12. Call c(a) and return its result. > > The production Assertion :: ( ? < ! Disjunction ) evaluates as follows: > > 1. Evaluate Disjunction to obtain a Matcher m. > 2. Return an internal Matcher closure that takes two arguments, a State > x and a Continuation c, and performs the following steps: > 1. Let n be the exact number of the sequence of code points which > Disjunction matches [NOTE]. > 2. Let xe be x's endIndex. > 3. If xe < n, then call c(x) and return its result. > 4. Let xcap be x's captures List. > 5. Let y be the State (xe-n, xcap). > 6. Let d be a Continuation that always returns its State argument as a > successful MatchResult. > 7. Call m(y, d) and let r be its result. > 8. If r is not failure, return failure. > 9. Call c(x) and return its result. > > NOTE: n is computed by counting and summing up the following Atoms in > the Disjunction: > a. Atom :: PatternCharacter > b. Atom :: . > c. Atom :: \ AtomEscape > d. Atom :: CharacterClass > > If an Atom is followed by a "QuantifierPrefix :: { DecimalDigits }", > then that Atom is counted as the number of DecimalDigits instead of one. > > If the Disjunction contains an "Atom :: ( Disjunction )" or "Atom :: ( ? > : Disjunction )" that is followed by a "QuantifierPrefix :: { > DecimalDigits }", then the number of code points which that > inner-Disjunction matches is multiplied by the number of DecimalDigits. > > ---- > > If it is not necessarily important to strictly follow the syntax of Perl > 5, then adopting the following syntax may make implemetation simple: > > (?<{n}= Disjunction) > (?<{n}! Disjunction) > > where n in {} is the number of code points by which the target sequence > is rewound when a look-behind assertion is evaluated. In the case of > this syntax, it is not required to count Atoms in Disjunction and the > number to rewind the sequence of code points is taken from n directly. > > > Regards, > Nozomu > > _______________________________________________ > es-discuss mailing list > [email protected] > https://mail.mozilla.org/listinfo/es-discuss
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

