#1 can’t happen.  There’s no way to get legal input, since any input must be 
encoded in some form, and since Unicode clearly states that lone values like 
D800 are illegal in any of the encodings.

Also, none of the inputs really like UTF-32.  We can munge it from UTF-8 or 
UTF-16 HTML to something else, but the developer still has it as UTF-8 or 
UTF-16, so this isn’t much of a burden for them.

But we can still allow code point notation (U+10FFFF), which mitigates most of 
the problem.

-Shawn

From: [email protected] [mailto:[email protected]] On 
Behalf Of Wes Garland
Sent: Tuesday, May 17, 2011 6:34 PM
To: Boris Zbarsky
Cc: [email protected]
Subject: Re: Full Unicode strings strawman

On 17 May 2011 20:09, Boris Zbarsky <[email protected]<mailto:[email protected]>> 
wrote:
On 5/17/11 5:24 PM, Wes Garland wrote:
Okay, I think we have to agree to disagree here. I believe my reading of
the spec is correct.

Sorry, but no...  how much more clear can the spec get?

In the past, I have read it thus, pseudo BNF:

UnicodeString => CodeUnitSequence // D80
CodeUnitSequence => CodeUnit | CodeUnitSequence CodeUnit // D78
CodeUnit => <anything in the current encoding form> // D77

Upon careful re-reading of this part of the specification, I see that D79 is 
also important.  It says that "A Unicode encoding form assigns each Unicode 
scalar value to a unique code unit sequence.", and further clarifies that "The 
mapping of the set of Unicode scalar values to the set of code unit sequences 
for a Unicode encoding form is one-to-one."

This means that your original assertion -- that Unicode strings cannot contain 
the high surrogate code points, regardless of meaning -- is in fact correct.

Which is unfortunate, as it means that we either

  1.  Allow non-Unicode strings in JS -- i.e. Strings composed of all values in 
the set [0x0, 0x1FFFFF]
  2.  Keep making programmers pay the raw-UTF-16 representation tax
  3.  Break the String-as-uint16 pattern
I still believe that #1 is the way forward, and that problem of round-tripping 
these values through the DOM is solvable.

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to