"Alexander Voropay" wrote on 2001-04-02 17:27 UTC:
>  Could you explain (as native speaker), how many KANJIs are  _really_
> different between C vs J vs K  ?  I think, about 95% are common.

Comment from the non-expert:

The underlying problem I suspect is that of different distance metrics
applied, depending on where you went to school and how familiar you are
with the historic development of these scripts. Unicode followed more
the perception of Chinese and Korean readers, and perhaps also the
perception of many Japanese linguists and other Japanese scholars,
whereas someone trained strictly according to the Japanese ministry of
education guidelines has apparently a somewhat different metric.

The question ultimately is a political one, namely whether the scholars
assembled in the IRG or the Japanese ministry of education has the
"right" view on what defines a distinct Han ideograph and what not. It
is ultimately historically the same script no doubt. Practically, there
is no big problem, because Japanese people just can use Unicode fonts
that follow strictly the Japanese ministry of education guidelines for
those ideographs that are widely used in Japan and no doubt we will
eventually have a very rich collection of such fonts.

The situation is perhaps somewhat comparable to en_GB versus en_US.
Imagine we introduced the ISCEW (International Standard Code for English
Words). Should "color" and "colour", "night" and "nite", "through" and
"thru" be assigned different code points or isn't it the same word in
the end? How about "lift" and "elevator"? If there are already legacy
codes for British and American Words around and you do not want to
double the 99% overlapping code space, you will have to agree on some
linguistically sensible unification criteria that simplifies sorting,
searching, translation, data entry, etc. (en_GB and en_US are a too
trivial examples, because the alphabet is the same, whereas differences
in ideographs are far more subtle due to different caligraphic styles)

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to