I also think it would have been wonderful if we could have a CJK Unification
long time ago and, also, if we could have an Unicode / ISO/IEC 10646 long
time ago and that's not just for or because of CJK Han characters but also
because of all the Latin scripts of different codesets and encodings. But in
practical sense and in reality, it's not really was possible and also is not
possible even in these days unfortunately in my opinion, i.e., I tink
"everyone on this globe use only Unicode / ISO/IEC 10646 from now on and
nothing else" is not really realistic and/or practical even though
I'm advocating Unicode all the time.
I guess we will have to accept that, i.e., variety of codesets and character
sets are being used, as a fact and try to support as many as possible.
Hopefully, over time, people will converge to the Unicode / ISO/IEC 10646
since it makes global computing much easier and aids it more seamlessly.
I also would like to point out that the "CJK unification" is concerning
more on the mappings between characters and their associated code values so
that an Han character can have a unique code value for the character for
Unicode / ISO/IEC 10646.
CJK glyph variation issue with Unicode / ISO/IEC 10646 is a different thing.
It's like a word spelled in different ways, let say, between English and
German, if I stretch a little bit. E.g., Morning and Morgen. (A Han character
is a word as you know.)
With regards,
Ienup
] X-URL: http://www.cl.cam.ac.uk/~mgk25/
] Date: Sat, 03 Feb 2001 16:22:17 +0000
] From: Markus Kuhn <[EMAIL PROTECTED]>
] Subject: Re: CJK Unification
] To: [EMAIL PROTECTED]
] MIME-version: 1.0
]
] Ienup Sung wrote on 2001-02-03 02:42 UTC:
] > Also, as an example, I placed a TIFF file at the following URL
] > that will display different variations of glyphs for same Unicode code
] > point values. (I started two dtterm terminal emulators, one with ja_JP.UTF-8
] > locale and the other with zh_CN.UTF-8 locale and then did 'more
/usr/pub/UTF-8'
] > in both terminal emulators to U+4E00:
] >
] > http://ienup.tripod.com/cjk-glyph-variations.tiff
] >
] > Particularly, please note glyphs like U+4E08, U+4E10, U+4E12, U+4E41, U+4E62
] > U+4EA4, U+4EC8, and so on; they are different.
]
] An essential reference for everyone interested in the subject:
]
] The ISO 10646-1:2000 standard prints the entire CJK collection in 5
] different ideographic fonts and comes with an appendix that documents
] the principle and rational behind the CJK unification in detail.
]
] Available from
]
] http://www.iso.ch/cate/d29819.html
]
] in PDF on CD-ROM for just 80 CHF (~45 USD).
]
] In my opinion, the CJK unification is something that should already have
] been done much earlier in the 1970s when the first ideographic character
] sets were standardized. Unfortunately, these projects were initiated
] only at a national level and the authors of the standards didn't talk to
] each other even though they worked on a highly overlapping character
] repertoire. Only in the late 1980s, the Chinese standards body started
] to work on CJK unification, which led in the end to the GB 13000 draft,
] which again was later incorporated into Unicode and ISO 10646. (see also
] Unicode 3.0, Appendix A)
]
] Markus
]
] --
] Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
] Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
]
] -
] Linux-UTF8: i18n of Linux on all levels
] Archive: http://mail.nl.linux.org/lists/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/