Re: Unicode

Manuel M. T. Chakravarty Tue, 16 May 2000 18:45:19 -0700
Frank Atanassow <[EMAIL PROTECTED]> wrote,

> George Russell writes:
>  > Marcin 'Qrczak' Kowalczyk wrote:
>  > > As for the language standard: I hope that Char will be allowed or
>  > > required to have >=30 bits instead of current 16; but never more than
>  > > Int, to be able to use ord and chr safely.
>  > Er does it have to?  The Java Virtual Machine implements Unicode with
>  > 16 bits.  (OK, so I suppose that means it can't cope
>  > with Korean or Chinese.) 
> 
> Just to set the record straight:
> 
> Many CJK (Chinese-Japanese-Korean) characters are
> encodable in 16 bits. I am not so familiar with the
> Chinese or Korean situations, but in Japan there is a
> nationally standardized subset of about 2000 characters
> called the Jyouyou ("often-used") kanji, which newspapers
> and most printed books are mostly supposed to
> respect. These are all strictly contained in the 16-bit
> space. One only needs the additional 16-bits for foreign
> characters (say, Chinese), older literary works and
> such-like. Even then, since Japanese has two phoenetic
> alphabets as well, and you can usually substitute
> phoenetic characters in the place of non-Jyouyou
> kanji---in fact, since these kanji are considered
> difficult, one often _does_ supplement the ideographic
> representation with a phoenetic one. Of course, using only
> phoenetic characters in such cases would look
> unprofessional in some contexts, and it forces the reader
> to guess at which word was meant...

The problem with restricting youself to the Jouyou-Kanji is
that you have a hard time with names (of persons and
places).  Many exotic and otherwise unused Kanji are used in
names (for historical reasons) and as the Kanji
representation of a name is the official identifier, it is
rather bad form to write a person's name in Kana (the
phonetic alphabets).

Cheers,
Manuel
Re: Unicode

Reply via email to