Re: Unicode, character ambiguities

Glenn Maynard Wed, 09 Jan 2002 00:25:51 -0800

On Wed, Jan 09, 2002 at 04:57:29PM +0900, Tomohiro KUBOTA wrote:
> The most well-known criticism against Unicode is that it unified
> "Han Ideograms" (Kanji) from Chinese, Japanese, and Korean "Han
> Ideograms" (Kanji) with similar shape and origin, though they
> are different "characters".  Even native CJK speakers and CJK
> scholars can have different opinions on a question that "this Kanji
> and that Kanji are different characters, or same characters with
> different shapes?"  Since Unicode takes an opinion which is
> different from most of common Japanese people, Japanese people
> came to generally hate Unicode.  It is natural that scholars have
> variety of opinions than common people and Unicode Consortium
> did find a native Japanese scholar who support Unicode's opinion.
> But the opinion is different from common Japanese people's....
> Thus, Japanese people think Unicode cannot distinguish different
> "characters" from China, Japan, and Korea.  Unicode's view is that
> "these characters are the same characters with different shale (glyph),
> so it should share one codepoint, because Unicode is a _character_
> code, not a _glyph_ code."  This is Han Unification.  Now nobody
> can stand against the political and commercial power of Unicode
> and Japanese people feel helpless....


Alright.  This just confirms most of what I believed.

> Note that I heard that Chinese and Korean people have different
> opinion on Kanji from Japanese.  They think Kanji from China,
> Japan, and Korea are "same character with different shape"
> and they accept Unicode.
> 
> If your software support only one language in one time, you can
> use Unicode and the problem is only to choose proper font.
> Here, "Japanese font" means a font which has Japanese glyph
> (in Unicode's view) for Han Unification codepoints.  Now, the
> problem is to use Japanese font for Japanese, Chinese font
> for Chinese, and Korean font for Korean.

My suggestion, in the case of Ogg tags, was to add a LANG (renamed to
UTF8_LANG) tag, indicating the font language the tags should be displayed
in (unless overridden).  This was also added to the proposal.  Japanese
users could tell their viewer to ignore this tag and always use a Japanese
font for CJK text.
 
> Of course if your software can have language information it is
> great.  mid-sentence language support is excellent!  Usage of
> Japanese font anywhere (I wrote above) is a _compromise_ , so
> it is always welcome to avoid the compromise.

This is overkill for tags, unless it's *needed* for day to day use (and
as you said, it isn't.)  It'd be more practical in a more comprehensive
metadata stream.

> There are a few ways to store language information.  Language tags
> above U+E0000, mark-up languages like XML, and so on.  I wonder
> whether "Variation Selectors" in Unicode 3.2 Beta 
> http://www.unicode.org/versions/beta.html
> can be used for this purpose or not....  Does anyone have information?

I hope so.

> Note that the internal encoding may be Unicode, but stream I/O
> encoding has to be specified by LC_CTYPE locale.  This is mandatory
> for internationalized softwares.

Or, in the case of Windows, the system codepage.

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, character ambiguities

Reply via email to