Re: Full Unicode based on UTF-16 proposal

David Herman Sat, 24 Mar 2012 14:22:48 -0700

On Mar 24, 2012, at 1:11 PM, Wes Garland wrote:

> On 24 March 2012 15:25, David Herman <[email protected]> wrote:
> > Presumably the JS source, as a sequence of UTF-16 code units, represents 
> > the tetragram code points as surrogate pairs.
> 
> Clarification: the JS source *of the regexp literal*.
> 
> 
> We certainly can, although this means that certain Unicode Strings cannot be 
> matched by a regexp with this flag. These strings would be the ones 
> containing reserved code points.


I didn't mean to imply *only* allowing non-BMP ranges by their unescaped 
representation, just that if it's possible that would often be nice and 
readable. I would certainly expect that we should also allow 
[\u{xxxxx}-\u{yyyyy}].

> That said, why is the JS source suddenly a sequence of UTF-16 code units?I 
> believe JS source code should be a sequence of Unicode code points (and I 
> think ES5 says something to this effect).

I'm not 100% clear on this point yet, but e.g. the SourceCharacter production 
in Annex A.1 is described as "any Unicode code unit."

> The underlying transport format should not be a concern for the JS lexer.

eval

Dave

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to