Re: YO, ho ho, and a bottle of vodka

Kenneth Whistler Mon, 05 Nov 2001 17:41:33 -0800

Thanks, Doug, for the comments.


>And I don't think you're supposed to exclude the surrogate code space 
(0xD800
>through 0xDFFF) from normal processing.  (This is the "D29 conundrum" -- 
all
>UTFs must support encoding of non-characters, including unpaired 
surrogates,
>even though UTF-16 cannot do this.)  The code you provided encodes 
unpaired
>surrogates in four bytes -- by pushing them down to the final "else" -- 
which
>is wrong in any event and almost certainly not what the programmer 
intended.

Yes, this is a goof. (I wrote a pseudo-code algorithm for going from 
Unicode scalar values to UTF-8 and assumed "surrogate" USVs are not valid. 
I wasn't anticipating at the time what a programmer would do with it.)

Any suggestions on what the right way to deal with "surrogate" codepoints 
in this algorithm? They should not occur in the data, but what if they do?


- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>

Re: YO, ho ho, and a bottle of vodka

Reply via email to