>You do realize that people in CJK locales expect some characters to be >double width that people in European/American locales expect to be single >width.
Doublewidth roman letters are in the unicode range FF00-FFFE, so when converting from a legacy encoding that assumes the ascii ranges are all doublewidth, you map to (ascii+FEE0). With unicode you can even mix double and singlewidth "ascii" in a single document; many of the roman letters became "kanji" when in doublewidth form (for example doublewidth capital letter H can mean pornography) and have a different meaning than their single-width brethren. So a unicode char-cell width function should function identically for all locales. (I dont know of any unicode support for fullwidth greek or cyrillic, but should such a thing be needed, there is room north of the BMP) > > i was imagining perhaps the difference between O(2 log n) and O(log n) > > would still be worthwhile :) > O(2 log n) = O(log n). yes, and O(1 billion years + log n) == O(log n) too :) -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
