On Mon, 15 Nov 2010 16:23:13 +0100, Marc Harter <wav...@gmail.com> wrote:

[look-behind allowing variable length body]

This was not my intention.  I am proposing zero-width lookbehind, which
would not allow for the case you specified above.

The grammar allows it. In ECMAScript it would be:
 "foobarbaz".match(/a(?<=(ob|bab)?)/
which would match the first "a".
Had it been written
 "foobarbaz".match(/a(?<=(ob|bab)?.)/


I will update the
proposal.  It is my understanding that lookahead as implemented in
ECMAScript also is zero-width and not variable.  This is also how Perl
has implemented lookbehind.

The look-ahead in ECMAScript has a Disjunction as content, which basically means that it can contain *any* RegExp (including quantified statements and other lookaheads). This works fine because the semantics of the disjunction is the same as any other disjunction in a RegExp: it's matched forwards from a position in the input.

Your proposal also uses a Disjunction as body, but it's not specified how to
evaluate that body so that it *ends* at the position of the assertion.
Executing a RegExp "backwards" isn't trivial. Well, mostly it is, by symmetry,
but it's not part of the spec.

The positive look-behind should probably be allowed to contain captures
that are still participating after the assertion succeeds (mirroring the semantics of
the positive look-ahead).

I believe PCRE allows variable length (but structurally simple) look-behinds, where the structure ensures that it doesn't have to do backtracking while checking them, even though Perl itself does not [1]. Whether that's a desired property or not is a different question (I would actually prefer a full backwards-executed regexp to an artificial
restriction, but that's mainly ideology :).

/L
[1] http://www.regular-expressions.info/lookaround.html


http://perldoc.perl.org/perlre.html#Extended-Patterns

Updated Proposal:
https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM


Is there an example of a language that supports the full regexp power
in lookbehinds so we can look at their experiences with implementing
it?


As far as I know Perl is the de facto standard.




2010/11/15 Marc Harter <wav...@gmail.com>:
> Brendan et al.,
>
> I have created a proposal for look-behind provided at this link:
>
> https://docs.google.com/document/pub?id=1EUHvr1SC72g6OPo5fJjelVESpd4nI0D5NQpF3oUO5UM
>
> I hope it is a format that will be helpful for discussion with TC39.
> Admittedly, I have never written one of these before so am completely open > to any feedback or ways to improve the document from yourself or anyone else
> on this list.
>
> Marc
>
> On Sat, 2010-11-13 at 09:32 -0600, Marc Harter wrote:
>
> I would be game to write up a proposal for this.  When would you need
> this by to discuss w/ TC39?
>
> Thanks for your consideration,
> Marc
>
> On Nov 12, 2010, at 5:04 PM, Brendan Eich <bren...@mozilla.com> wrote:
>
>> On Nov 12, 2010, at 2:52 PM, Marc Harter wrote:
>>
>>> After considering all the breadth this discussion could take maybe it >>> would be wise to just focus on one issue at a time. For me, the biggest
>>> missing feature is lookbehind.  Its common to most languages
>>> implementing the Perl-RegExp-syntax, it is very useful when looking for >>> patterns that follow or don't follow a particular pattern. I guess I'm
>>> confused why lookahead made it in but not lookbehind.
>>
>> This was 1998, Netscape 4 work I did in '97 was based on Perl 4(!), but we >> proposed to ECMA TC39 TG1 (the JS group -- things were different then,
>> including capitalization) something based on Perl 5. We didn't get
>> everything, and we had to rationalize some obvious quirks.
>>
>> I don't remember lookbehind (which emerged in Perl 5.005 in July '98)
>> being left out on purpose. Waldemar may recall more, I'd handed him the JS
>> keys inside netscape.com to go do mozilla.org.
>>
>> If you are game to write a proposal or mini-spec (in the style of ES5
>> even), let me know. I'll chat with other TC39'ers next week about this.
>>
>> /be
>>
>>
>>> What do people
>>> think about including this feature?
>>>
>>> Marc
>>>
>>> On Fri, 2010-11-12 at 16:20 -0600, Marc Harter wrote:
>>>> I will start out with a disclaimer. I have not read both ECMAScript >>>> specifications for 3 and now 5, so I admit that I am not an expert in >>>> the spec itself but as I user of JavaScript, I would like to get some
>>>> expert discussion over this topic as proposed enhancements to the
>>>> RegExp engine for Harmony.
>>>>
>>>> I will start with a list of lacking features in JS as compared to Perl
>>>> provided by (http://www.regular-expressions.info/javascript.html):
>>>>
>>>>     * No \A or \Z anchors to match the start or end of the string.
>>>>       Use a caret or dollar instead.
>>>>     * Lookbehind is not supported at all. Lookahead is fully
>>>>       supported.
>>>>     * No atomic grouping or possessive quantifiers
>>>> * No Unicode support, except for matching single characters with
>>>>       \uFFFF
>>>>     * No named capturing groups. Use numbered capturing groups
>>>>       instead.
>>>>     * No mode modifiers to set matching options within the regular
>>>>       expression.
>>>>     * No conditionals.
>>>>     * No regular expression comments. Describe your regular
>>>>       expression with JavaScript // comments instead, outside the
>>>>       regular expression string.
>>>>
>>>> I don't know if all of these "need" to be in the language but there
>>>> have been some that I have personally wanted to use:
>>>>
>>>>     * Lookbehind!  ECMAScript fully supports lookahead, why not
>>>>       lookbehind?  Seems like a big hole to me.
>>>>     * Named capturing groups and comments (e.g.
>>>>       http://xregexp.com/syntax/).  Mostly I argue for this because
>>>>       it makes RegExp matches more self-documenting.  Regular
>>>>       Expressions are already cryptic as it is.
>>>>
>>>> I do like some of the new flags proposed in
>>>> (http://xregexp.com/flags/) but personally haven't used them but maybe
>>>> that is something also for discussion.
>>>>
>>>> Marc Harter
>>>
>>> _______________________________________________
>>> es-discuss mailing list
>>> es-discuss@mozilla.org
>>> https://mail.mozilla.org/listinfo/es-discuss
>>
>
> _______________________________________________
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
>


--
Lasse Reichstein - reichsteinatw...@gmail.com
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to