Re: Full Unicode strings strawman

Allen Wirfs-Brock Mon, 16 May 2011 15:07:13 -0700

On May 16, 2011, at 2:16 PM, Mike Samuel wrote:

> 2011/5/16 Boris Zbarsky <[email protected]>:
>> On 5/16/11 4:37 PM, Mike Samuel wrote:
>>> 
>>> 
> 
>> There is no Unicode codepoint U+D800 or U+DC00.  See
>> http://www.unicode.org/charts/PDF/UD800.pdf and
>> http://www.unicode.org/charts/PDF/UDC00.pdf which clearly say that there are
>> no Unicode characters with those codepoints.
> 
> Correct.
> The strawman says
> 
> "The String type is the set of all finite ordered sequences of zero or
> more 21-bit unsigned integer values (“elements”)."
> 
> There is no exclusion for invalid code-points, so I was assuming when
> Allen talked about an encodeUTF16 function that he was purposely
> fuzzing the term "codepoint" to include the entire range, and that
> encodeUTF16(oneSupplemental).charCodeAt(0) === 0xd800.


Correct in my proposal, ES string elements are 21-bit values.  All possible 
values are useable even though some are  not valid Unicode code points. We may 
not have a clear common language let for referring to such element values.  If 
current ES we call them "character codes" but we need to be careful about 
moving that terminology forward because it occurs in APIs that depend upon 
character codes being 16-bit values.

encodeUTF16 is a Unicode domain specific function.  It would need to define 
what it does when encountering a "character code" that is not a valid codepoint.

Allen

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode strings strawman

Reply via email to