On 24 March 2012 15:25, David Herman <[email protected]> wrote: > > Presumably the JS source, as a sequence of UTF-16 code units, represents > the tetragram code points as surrogate pairs. > > Clarification: the JS source *of the regexp literal*. > > We certainly can, although this means that certain Unicode Strings cannot be matched by a regexp with this flag. These strings would be the ones containing reserved code points.
That said, why is the JS source suddenly a sequence of UTF-16 code units?I believe JS source code should be a sequence of Unicode code points (and I think ES5 says something to this effect). The underlying transport format should not be a concern for the JS lexer. The lexer should receive a series of code points from the network transport, allowing web sites to transmit JS in whatever encoding they see fit, provided the browser and server can both agree on it. I think UTF-8 would make a fine transport format for JS source code. IMHO the transport format between the browser and the JS lexer [i.e. the input program encoding] should be allowed to be implementation-defined and not specified by TC-39. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

