What I had sent out was originially rich text, and I had missed a base-16 subscript, which led to what looks like an error. The line
> C3 = (U mod 100016) \ x40 + x80 should read > C3 = (U mod x1000) \ x40 + x80 Peter On 11/14/2001 10:15:06 AM Peter Constable wrote: >A week or so ago, I asked for comments on a C++ algorithm for converting >UTF-32 to UTF-8. There were a couple of things pointed out to me that had >to do with the pseudo-code algorithm I provided to the developer. Here's a >revised pseudo-code algorithm: > >U is a Unicode scalar value; C1, C2, etc. are byte code units in a UTF-8 >sequence; and \ is integer divide. > >If U <= U+007F, then > C1 = U >Else if U+0080 <= U <= U+07FF, then > C1 = U \ x40 + xC0 > C2 = U mod x40 + x80 >Else if U+0800 <= U <= U+D7FF, or if U+E000 <= U <= U+FFFF, then > C1 = U \ x1000 + xE0 > C2 = (U mod x1000) \ x40 + x80 > C3 = U mod x40 + x80 >Else if U >= U+FFFF, then > C1 = U \ x40000 + xF0 > C2 = (U mod x40000) \ x1000 + x80 > C3 = (U mod 100016) \ x40 + x80 > C4 = U mod x40 + x80 >Else > Error >End if > > >- Peter > > >--------------------------------------------------------------------------- >Peter Constable > >Non-Roman Script Initiative, SIL International >7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA >Tel: +1 972 708 7485 >E-mail: <[EMAIL PROTECTED]> > >

