#1 can’t happen. There’s no way to get legal input, since any input must be encoded in some form, and since Unicode clearly states that lone values like D800 are illegal in any of the encodings.
Also, none of the inputs really like UTF-32. We can munge it from UTF-8 or UTF-16 HTML to something else, but the developer still has it as UTF-8 or UTF-16, so this isn’t much of a burden for them. But we can still allow code point notation (U+10FFFF), which mitigates most of the problem. -Shawn From: [email protected] [mailto:[email protected]] On Behalf Of Wes Garland Sent: Tuesday, May 17, 2011 6:34 PM To: Boris Zbarsky Cc: [email protected] Subject: Re: Full Unicode strings strawman On 17 May 2011 20:09, Boris Zbarsky <[email protected]<mailto:[email protected]>> wrote: On 5/17/11 5:24 PM, Wes Garland wrote: Okay, I think we have to agree to disagree here. I believe my reading of the spec is correct. Sorry, but no... how much more clear can the spec get? In the past, I have read it thus, pseudo BNF: UnicodeString => CodeUnitSequence // D80 CodeUnitSequence => CodeUnit | CodeUnitSequence CodeUnit // D78 CodeUnit => <anything in the current encoding form> // D77 Upon careful re-reading of this part of the specification, I see that D79 is also important. It says that "A Unicode encoding form assigns each Unicode scalar value to a unique code unit sequence.", and further clarifies that "The mapping of the set of Unicode scalar values to the set of code unit sequences for a Unicode encoding form is one-to-one." This means that your original assertion -- that Unicode strings cannot contain the high surrogate code points, regardless of meaning -- is in fact correct. Which is unfortunate, as it means that we either 1. Allow non-Unicode strings in JS -- i.e. Strings composed of all values in the set [0x0, 0x1FFFFF] 2. Keep making programmers pay the raw-UTF-16 representation tax 3. Break the String-as-uint16 pattern I still believe that #1 is the way forward, and that problem of round-tripping these values through the DOM is solvable. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

