Thanks, Doug, for the comments.
>And I don't think you're supposed to exclude the surrogate code space (0xD800 >through 0xDFFF) from normal processing. (This is the "D29 conundrum" -- all >UTFs must support encoding of non-characters, including unpaired surrogates, >even though UTF-16 cannot do this.) The code you provided encodes unpaired >surrogates in four bytes -- by pushing them down to the final "else" -- which >is wrong in any event and almost certainly not what the programmer intended. Yes, this is a goof. (I wrote a pseudo-code algorithm for going from Unicode scalar values to UTF-8 and assumed "surrogate" USVs are not valid. I wasn't anticipating at the time what a programmer would do with it.) Any suggestions on what the right way to deal with "surrogate" codepoints in this algorithm? They should not occur in the data, but what if they do? - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>

