> The difference is that in UTF-8, 0xed 0xb0 0x88 means "The Unicode code point > 0xdc08", In UTF-8 0xed 0xb0 0x88 means “Garbage, please replace me with 0xFFFD”. CESU-8 allows this, but that sequence is illegal in UTF-8. The Windows SDK and .Net both disallow ill-formed UTF-8 code points for security reasons. I’m sure you can find other libraries that allow them still, but this sequence is ill-formed and considered a security threat. D92 of unicode 5.0 makes this clear. > and in UTF-16 0xdc08 means "Part of some non-BMP code point".
Only if there was a 0xd800-0xdbff before it. Otherwise it is also ill-formed. -Shawn
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

