Hi, At Mon, 25 Feb 2002 17:24:20 -0500, Glenn Maynard wrote:
> Kanji appear to be getting collated, however: > > 05:13pm [EMAIL PROTECTED]/2 [~] sort > 日本 > 綺麗 > 日本 > (eof) > 日本 > 日本 > 綺麗 > > (I couldn't tell if that's the correct collation order, but it's clear > they're being reordered, where the hiragana above are not.) It is impossible to collate Kanji by using simple functions such as strcoll(), because one Kanji has several readings depending on context (or word) in most cases. (This is Japanese case). (It is technically virtually impossible. It will need natural language understanding algorithm.) For Korean, one Kanji (Hanja) has one reading in most cases, though there are exceptions. However, if we ingore such exceptions, strcoll() will work by using reading table for all Ideogram characters. (Though it is technically possible, it will need a large dictionary). I don't know about Chinese. Thus, strcoll() simply works as strcmp(). --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ "Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
