Re: Full Unicode strings strawman

Wes Garland Tue, 17 May 2011 18:33:49 -0700

On 17 May 2011 20:09, Boris Zbarsky <[email protected]> wrote:

> On 5/17/11 5:24 PM, Wes Garland wrote:
>
>> Okay, I think we have to agree to disagree here. I believe my reading of
>> the spec is correct.
>>
>
> Sorry, but no...  how much more clear can the spec get?
>
>
In the past, I have read it thus, pseudo BNF:


UnicodeString => CodeUnitSequence // D80
CodeUnitSequence => CodeUnit | CodeUnitSequence CodeUnit // D78
CodeUnit => <anything in the current encoding form> // D77

Upon careful re-reading of this part of the specification, I see that D79 is
also important.  It says that "A Unicode encoding form assigns each Unicode
scalar value to a unique code unit sequence.", and further clarifies that
"The mapping of the set of Unicode scalar values to the set of code unit
sequences for a Unicode encoding form is one-to-one."

This means that your original assertion -- that Unicode strings cannot
contain the high surrogate code points, regardless of meaning -- is in fact
correct.

Which is unfortunate, as it means that we either

   1. Allow non-Unicode strings in JS -- i.e. Strings composed of all values
   in the set [0x0, 0x1FFFFF]
   2. Keep making programmers pay the raw-UTF-16 representation tax
   3. Break the String-as-uint16 pattern

I still believe that #1 is the way forward, and that problem of
round-tripping these values through the DOM is solvable.

Wes

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode strings strawman

Reply via email to