I have updated my proposal for Supplementary Characters in ECMAScript [1] based 
on the feedback on the es-discuss@ mailing list [2] and the TC 39 meeting in 
March [3]. This updated version generally reflects the consensus reached at the 
meeting, but provides more detail. Changes are listed in the Updates section.

The proposal keeps UTF-16 as the encoding for source text and String values in 
ECMAScript, but updates the specification of functionality that interprets them 
to do so based on an interpretation as code points, thus enabling support for 
the full Unicode character set. In the case of regular expressions, this 
requires the introduction of a new Unicode mode.

There's one change from what was previously discussed: We had discussed using 
full Unicode case folding in the Unicode mode of regular expressions, including 
mappings that map a single code point to a sequence of code points, such as "ß" 
-> "ss". In trying to integrate this into the spec, I found that Unicode 
Technical Standard 18, Unicode Regular Expressions [4], doesn't completely 
specify the interpretation of such mappings, for example in character classes. 
I therefore reverted to simple Unicode case folding for now, and provided 
feedback to the Unicode Consortium requesting clarification.

To those on public-script-coord@ but not on es-discuss@, my apologies for not 
keeping you in the loop after the first round of discussions in February/March. 
I'll try to pay more attention to this list in the future.

Best regards,
Norbert

[1] 
http://norbertlindenberg.com/2012/05/ecmascript-supplementary-characters/index.html
[2] https://mail.mozilla.org/pipermail/es-discuss/2012-March/thread.html#21620
[3] https://mail.mozilla.org/pipermail/es-discuss/2012-March/thread.html#21919
[4] http://unicode.org/reports/tr18/#Default_Loose_Matches
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to