Re: USV to UTF-8 mapping

Peter_Constable Wed, 14 Nov 2001 20:48:19 -0800

What I had sent out was originially rich text, and I had missed a base-16 
subscript, which led to what looks like an error. The line


>        C3 = (U mod 100016) \ x40 + x80

should read

>        C3 = (U mod x1000) \ x40 + x80

Peter



On 11/14/2001 10:15:06 AM Peter Constable wrote:

>A week or so ago, I asked for comments on a C++ algorithm for converting
>UTF-32 to UTF-8. There were a couple of things pointed out to me that had
>to do with the pseudo-code algorithm I provided to the developer. Here's 
a
>revised pseudo-code algorithm:
>
>U is a Unicode scalar value; C1, C2, etc. are byte code units in a UTF-8
>sequence; and \ is integer divide.
>
>If U <= U+007F, then
>        C1 = U
>Else if U+0080 <= U <= U+07FF, then
>        C1 = U \ x40 + xC0
>        C2 = U mod x40 + x80
>Else if U+0800 <= U <= U+D7FF, or if U+E000 <= U <= U+FFFF, then
>        C1 = U \ x1000 + xE0
>        C2 = (U mod x1000) \ x40 + x80
>        C3 = U mod x40 + x80
>Else if U >= U+FFFF, then
>        C1 = U \ x40000 + xF0
>        C2 = (U mod x40000) \ x1000 + x80
>        C3 = (U mod 100016) \ x40 + x80
>        C4 = U mod x40 + x80
>Else
>        Error
>End if
>
>
>- Peter
>
>
>---------------------------------------------------------------------------
>Peter Constable
>
>Non-Roman Script Initiative, SIL International
>7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
>Tel: +1 972 708 7485
>E-mail: <[EMAIL PROTECTED]>
>
>

Re: USV to UTF-8 mapping

Reply via email to