Re: Full Unicode strings strawman

Mark Davis ☕ Mon, 16 May 2011 15:06:13 -0700

In practice, the supplemental code points don't really cause problems in
Unicode strings. Most implementations just treat them as if they were
unassigned. The only important issue is that *when* they are converted to
UTF-xx for storage or transmission, they need to be handled; typically by
converting to FFFD (never just deleted - a bad idea for security).


Mark

*— Il meglio è l’inimico del bene —*


On Mon, May 16, 2011 at 14:46, Boris Zbarsky <[email protected]> wrote:

> On 5/16/11 5:16 PM, Mike Samuel wrote:
>
>> The strawman says
>>
>> "The String type is the set of all finite ordered sequences of zero or
>> more 21-bit unsigned integer values (“elements”)."
>>
>
> Yeah, that's not the same thing as an actual Unicode string, and requires
> handling of all sorts of "what if someone sticks non-Unicode in there?"
> issues...
>
> Of course people actually do use JS strings as immutable arrays of 16-bit
> unsigned integers right now (not just as byte arrays), so I suspect that we
> can't easily exclude the surrogate ranges from "strings" without breaking
> existing content...
>
>
> -Boris
> _______________________________________________
> es-discuss mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/es-discuss
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode strings strawman

Reply via email to