sorting order of Kanji

Tomohiro KUBOTA Mon, 25 Feb 2002 16:12:40 -0800

Hi,

At Mon, 25 Feb 2002 17:24:20 -0500,
Glenn Maynard wrote:


> Kanji appear to be getting collated, however:
> 
> 05:13pm [EMAIL PROTECTED]/2 [~] sort
> 日本
> 綺麗
> 日本
> (eof)
> 日本
> 日本
> 綺麗
> 
> (I couldn't tell if that's the correct collation order, but it's clear
> they're being reordered, where the hiragana above are not.)

It is impossible to collate Kanji by using simple functions such
as strcoll(), because one Kanji has several readings depending on
context (or word) in most cases.  (This is Japanese case).
(It is technically virtually impossible.  It will need natural
language understanding algorithm.)

For Korean, one Kanji (Hanja) has one reading in most cases,
though there are exceptions.  However, if we ingore such exceptions,
strcoll() will work by using reading table for all Ideogram characters.
(Though it is technically possible, it will need a large dictionary).

I don't know about Chinese.

Thus, strcoll() simply works as strcmp().

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

sorting order of Kanji

Reply via email to