Alexander Voropay wrote:
> 
>  Could you explain (as native speaker), how many KANJIs are  _really_
> different between C vs J vs K  ?  I think, about 95% are common.

That's hard to answer, without strict criteria.  The IRG has given
their definition, if you look at their unification rules, and the
kZVariant fields in the unihan.txt file.

However, using what Unicode considers to be the "same character", and
looking only at China's GB2312, Taiwan's CNS 11643 planes 1 and 2,
Japan's JIS X 0208 and JIS X 0212, South Korea's KS X 1001, and
Vietnam's
TCVN 6056, there are approximately little more than 1500 characters that
are the same (=map to each other).  Of course, this figure is
artificially low, since it only considers characters that are in common
use on most computers, and does not consider z-variants, e.g.,
'black' is artifically yanked apart into U+9ED1 and U+9ED2, even though
it is a minor change that turns the "v"-like piece in the former to a
horizontal bar in the upper part of the latter.


>  Otherwise, it is possible to introduce new characters in future versions
> of Unocode (ISO-10646) if they has really different glyph. The Unicode
> codespace is big and open :-)

That's not possible.  If you look in virtually any version of the
unihan.txt database file (I'll use the March 27, 2001 version), it says:

  The following fields may be taken as completely accurate and their
  values are *normative* parts of Unicode and ISO/IEC 10646-1 and -2:
  kIRG_GSource, kIRG_TSource, kIRG_JSource, kIRG_KSource, kIRG_VSource

(No comment about the kIRG_HSource ?--perhaps the quote is a bit out
of date.)

You also can't disunify without trashing existing implementations.

The same file does not say anything about the status of the four
dictionary mappings kIRGKangXi, kIRGDaiKanwaJiten, kIRGHanyuDaZidian,
kIRGDaweon .  However, there is a certain precedence--a dictionary
from 1716 China, a dictionary from mid/late 20th c. Japan, a
dictionary from late 20th c. mainland China, and a dictionary from
late 20th c. South Korea.

It's just unfortunate that the Unicode books have always been printed
with a font intended for mainland China (zh_CN) usage, since a lot of
the objections from Japan are really over mainland China's typographical
preferences.  Even in Japan, there are glyph distinctions between
handwritten style and printed style, analogous to U+0251 LATIN SMALL
LETTER ALPHA, and the kind of <a> you would normally see printed in an
English newspaper.  What mainland China has done is make their glyphs
for printing match one particular form in the handwritten style.
Unfortunately, that ones that were chosen weren't always ones that a
typical Japanese person would recognize.

I daresay that a lot of controversies about Han unification would not
exist, or be lessened, if the books had been printed with a font
supplied from a Japanese source, since it is Japanese who are
generally stricter about differences in form.  To use a particular
example that has been brought up, U+76F4 'straight', there are basically
two major forms: 1) one that everyone recognizes, and the only one
that Japanese recognize, and 2) one that only Chinese (from anywhere)
recognize, and made into the standard printing style glyph (mainland
China only).  It's also weird, since looking at the four dictionaries,
it is not until you get to the tertiary one that you have the form that
there are (Japanese) objections to.  I think that creating a font with
glyphs that are as close to the style in the first (and possibly second)
dictionary sources should satisfy almost everyone for plain-text use.

I have a page on this particular character, at:
  http://deall.ohio-state.edu/grads/chan.200/cjkv/u76F4/

I have quickly added some information in section #2 from the
_Dai Kanwa Jiten_--I apologize for the incomplete bibliographic
information and the low quality images (they were from photocopies).
I'd like to point out entry #23145 in that dictionary, which I believe
shows that there are Japanese which are familiar with the so-called
"Chinese form".  (I don't have the same intuitions--Mr. Kubota, do you
think this is the "same" as what you call the "Chinese form"?)

However, as it is given a separate entry from #23136 (the mapping for
U+76F4), and even quotes an entry from an earlier dictionary that
distinguished it (the _Zhengzi Tong_, a Chinese dictionary from 1671).
I don't know what the criteria was used in that dictionary, or even
the _Dan Kanwa Jiten_, for deciding what is a substantial difference
to treat something as not a glyph variant, but this would appear to
be one possible piece of evidence to support claims of improper
unification.


Thomas Chan
[EMAIL PROTECTED]
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to