On 17 May 2011 20:09, Boris Zbarsky <[email protected]> wrote: > On 5/17/11 5:24 PM, Wes Garland wrote: > >> Okay, I think we have to agree to disagree here. I believe my reading of >> the spec is correct. >> > > Sorry, but no... how much more clear can the spec get? > > In the past, I have read it thus, pseudo BNF:
UnicodeString => CodeUnitSequence // D80 CodeUnitSequence => CodeUnit | CodeUnitSequence CodeUnit // D78 CodeUnit => <anything in the current encoding form> // D77 Upon careful re-reading of this part of the specification, I see that D79 is also important. It says that "A Unicode encoding form assigns each Unicode scalar value to a unique code unit sequence.", and further clarifies that "The mapping of the set of Unicode scalar values to the set of code unit sequences for a Unicode encoding form is one-to-one." This means that your original assertion -- that Unicode strings cannot contain the high surrogate code points, regardless of meaning -- is in fact correct. Which is unfortunate, as it means that we either 1. Allow non-Unicode strings in JS -- i.e. Strings composed of all values in the set [0x0, 0x1FFFFF] 2. Keep making programmers pay the raw-UTF-16 representation tax 3. Break the String-as-uint16 pattern I still believe that #1 is the way forward, and that problem of round-tripping these values through the DOM is solvable. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

