On Mar 24, 2012, at 1:11 PM, Wes Garland wrote:
> On 24 March 2012 15:25, David Herman <[email protected]> wrote:
> > Presumably the JS source, as a sequence of UTF-16 code units, represents
> > the tetragram code points as surrogate pairs.
>
> Clarification: the JS source *of the regexp literal*.
>
>
> We certainly can, although this means that certain Unicode Strings cannot be
> matched by a regexp with this flag. These strings would be the ones
> containing reserved code points.
I didn't mean to imply *only* allowing non-BMP ranges by their unescaped
representation, just that if it's possible that would often be nice and
readable. I would certainly expect that we should also allow
[\u{xxxxx}-\u{yyyyy}].
> That said, why is the JS source suddenly a sequence of UTF-16 code units?I
> believe JS source code should be a sequence of Unicode code points (and I
> think ES5 says something to this effect).
I'm not 100% clear on this point yet, but e.g. the SourceCharacter production
in Annex A.1 is described as "any Unicode code unit."
> The underlying transport format should not be a concern for the JS lexer.
eval
Dave
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss