On May 16, 2011, at 2:16 PM, Mike Samuel wrote:
> 2011/5/16 Boris Zbarsky <[email protected]>:
>> On 5/16/11 4:37 PM, Mike Samuel wrote:
>>>
>>>
>
>> There is no Unicode codepoint U+D800 or U+DC00. See
>> http://www.unicode.org/charts/PDF/UD800.pdf and
>> http://www.unicode.org/charts/PDF/UDC00.pdf which clearly say that there are
>> no Unicode characters with those codepoints.
>
> Correct.
> The strawman says
>
> "The String type is the set of all finite ordered sequences of zero or
> more 21-bit unsigned integer values (“elements”)."
>
> There is no exclusion for invalid code-points, so I was assuming when
> Allen talked about an encodeUTF16 function that he was purposely
> fuzzing the term "codepoint" to include the entire range, and that
> encodeUTF16(oneSupplemental).charCodeAt(0) === 0xd800.
Correct in my proposal, ES string elements are 21-bit values. All possible
values are useable even though some are not valid Unicode code points. We may
not have a clear common language let for referring to such element values. If
current ES we call them "character codes" but we need to be careful about
moving that terminology forward because it occurs in APIs that depend upon
character codes being 16-bit values.
encodeUTF16 is a Unicode domain specific function. It would need to define
what it does when encountering a "character code" that is not a valid codepoint.
Allen
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss