Re: Surrogate space in Unicode

DougEwell2 Thu, 15 Feb 2001 23:53:50 -0800
In a message dated 2001-02-15 23:15:23 Pacific Standard Time, [EMAIL PROTECTED] 
writes:

>   It has proven difficult to come up with convenient terms for
>   the Unicode characters encoded at U+10000 and beyond.
>   [....]
>   2.  A 'basic' code point, which may represent a 'basic
>   character', can range from U+0000 through U+FFFF.
>  
>  For what purpose is such a distinction needed?  

It is needed because of UTF-16, which requires two 16-bit code points to 
represent a character with a value of U+10000 or higher (a supplementary 
character) but only one 16-bit code point to represent a basic character.

Many descriptions on the Web erroneously claim that Unicode contains only the 
first 64K characters of ISO 10646.  Even the Unicode Standard Version 3.0 
states, "Plain Unicode text consists of sequences of 16-bit character codes." 
 To me this sentence is very misleading and requires that special attention 
be paid to the nature of supplementary characters, those to be assigned in 
Unicode 3.1 and those to be assigned in future versions.

Because of the widespread belief that Unicode stops at U+FFFF, many fonts and 
applications that claim to support Unicode can only handle basic characters, 
not supplementary characters.

-Doug Ewell
 Fullerton, California
Re: Surrogate space in Unicode

Reply via email to