RE: Full Unicode strings strawman

Shawn Steele Tue, 17 May 2011 14:30:59 -0700

> The difference is that in UTF-8, 0xed 0xb0 0x88 means "The Unicode code point 
> 0xdc08",
In UTF-8 0xed 0xb0 0x88 means “Garbage, please replace me with 0xFFFD”.  CESU-8 
allows this, but that sequence is illegal in UTF-8.  The Windows SDK and .Net 
both disallow ill-formed UTF-8 code points for security reasons.  I’m sure you 
can find other libraries that allow them still, but this sequence is ill-formed 
and considered a security threat.  D92 of unicode 5.0 makes this clear.
> and in UTF-16 0xdc08 means "Part of some non-BMP code point".


Only if there was a 0xd800-0xdbff before it.  Otherwise it is also ill-formed.
-Shawn

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

RE: Full Unicode strings strawman

Reply via email to