On 2015-12-10 16:06, Mike Schwab wrote:
> https://en.wikipedia.org/wiki/UTF-8
> B'0.......'  is a 8 bit ASCII characters.
>
ITYM 7 bit.  (Well, maybe.)

> B'110.....' is a 16 bit UTF character.
>

(Or, perhaps, only Unicode 13.)
> B'1110....' is a 24 bit UTF character.
>
(Or, perhaps, only Unicode 20.)
Etc.

> B'11110...' is a 32 bit UTF character.
> B'111110..' could be a 40 bit UTF character (none established).
> B'1111110.' could be a 48 bit UTF character (none established).
> B'11111110' could be a 56 bit UTF character (none established).
> B'11111111' could be a 64 bit UTF character (none established).
> B'10......' is a continuation UTF character after a previous leading 
> character.
> B'10000000' is a padding UTF character and should be removed.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to