On 5/16/11 4:37 PM, Mike Samuel wrote:
You might have.  If you reject my assertion about option 2 above, then
to clarify,
The UTF-16 representation of codepoint U+10000 is the code-unit pair
U+D8000 U+DC000.

No. The UTF-16 representation of codepoint U+10000 is the code-unit pair 0xD800 0xDC00. These are 16-bit unsigned integers, NOT Unicode characters (which is what the U+NNNNN notation means).

The UTF-16 representation of codepoint U+D8000 is the single code-unit
U+D8000 and similarly for U+DC00.

I'm assuming you meant U+D800 in the first two code-units there.

There is no Unicode codepoint U+D800 or U+DC00. See http://www.unicode.org/charts/PDF/UD800.pdf and http://www.unicode.org/charts/PDF/UDC00.pdf which clearly say that there are no Unicode characters with those codepoints.

How can the codepoints U+D800 U+DC00 be distinguished in a DOMString
implementation that uses UTF-16 under the hood from the codepoint
U+10000?

They don't have to be; if 0xD800 0xDC00 are present (in that order) then they encode U+10000. If they're present on their own, it's not a valid UTF-16 string, hence not a valid DOMString and some sort of error-handling behavior (which presumably needs defining) needs to take place.

That said, defining JS strings and DOMString differently seems like a recipe for serious author confusion (e.g. actually using JS strings as the DOMString binding in ES might be lossy, assigning from JS strings to DOMString might be lossy, etc). It's a minefield.

-Boris
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to