Re: perl unicode support

ＳｒｉｎＴｕａｒ Wed, 28 Mar 2007 14:12:56 -0800

And I suspect that most of them will choose to support the
language-specific frills that 40% of the world's population demand. In
fact, I don't know of a single language-specific "frill" that has as
much demand as casing; the non-casing scripts are a pretty diverse
bunch that the majority share no "frill" as key to them as casing is
Cyrillic, Latin and Greek.


You are right that basic latin case folding is pretty important, and probably
deserves its special significance. The latin alphabet is used all the
time in Japanese and VIetnamese. ( In vietnam Im pretty sure they
don't see it as foreign )

The more advanced and language specific versions of if are very tricky though.
(german, turkish, etc) and rarely done right. I don't think the
standard C library even has a sequence based toupper/tolower.
Titlecase strikes me as even weirder, because then you have to have
rules to decide what constitutes a word- which is not as trivial as it
sounds and is very language dependant.

In general, I try to avoid designing anything case-insensitive as much
as possible. But sometimes its unavoidable, such as when handling
search. In those cases, its impossible to write one routine that can
handle all languages, because of the incompatibilities betwene
different systems. Most of the time, doing a basic best effort gets
you by, but it does leave a bad aftertaste.

Even when making the be all end all language sensitive case folding
system, I can imagine problems than are unsolvable. How would you
handle this problem: search for a mix of german, english, and turkish
words in a case-insensitive manner ?

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: perl unicode support

Reply via email to