Mark; Are you Dr. *Mark E. Davis* (born September 13, 1952 (age 58)), co-founder of the Unicode <http://en.wikipedia.org/wiki/Unicode> project and the president of the Unicode Consortium<http://en.wikipedia.org/wiki/Unicode_Consortium>since its incorporation in 1991?
(If so, uh, thanks for giving me alternatives to Shift-JIS, GB-2312, Big-5, et al..those gave me lots of hair loss in the late 90s) On 17 May 2011 21:55, Mark Davis ☕ <[email protected]> wrote:In the past, I have read it thus, pseudo BNF: > >> UnicodeString => CodeUnitSequence // D80 >> CodeUnitSequence => CodeUnit | CodeUnitSequence CodeUnit // D78 >> CodeUnit => <anything in the current encoding form> // D77 >> > > So far, so good. In particular, d800 is a code unit for UTF-16, since it is > a code unit that can occur in some code unit sequence in UTF-16. > *head smack* - code unit, not code point. > > >> This means that your original assertion -- that Unicode strings cannot >> contain the high surrogate code points, regardless of meaning -- is in fact >> correct. >> > > That is incorrect. > Aie, Karumba! If we have - a sequence of code points - taking on values between 0 and 0x1FFFFF - including high surrogates and other reserved values - independent of encoding ..what exactly are we talking about? Can it be represented in UTF-16 without round-trip loss when normalization is not performed, for the code points 0 through 0xFFFF? Incidentally, I think this discussion underscores nicely why I think we should work hard to figure out a way to hide UTF-16 encoding details from user-end programmers. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

