Re: perl unicode support

ＳｒｉｎＴｕａｒ Tue, 27 Mar 2007 09:52:36 -0800

But in most of the cases you have to _think_ in
characters, otherwise it's quite unlikely that your application will work
correctly.


I'm not quite sure how "thinking in characters" helps an application,
in general. I'd be interested if you had a concrete example...

> The only time source code needs to care about
> characters is when it has to layout or format them for display.

No, there are many more situations. Even if your job is so simple that you
only have to convert a text to uppercase, you already have to know what
encoding (and actually what locale) is being used.


Thinking in characters for that: such as calling a function like
"toupper" is broken.
There is no guarantee that case folding will maintain a 1 to 1 mapping of
unicode codepoints.

Here you are better off working with whole strings, and when doing so you don't
have to think in characters or codepoints at all.

Finding a particular
letter (especially in case insentitive mode), performing regexp matching,
alphabetical sorting etc. are just a few trivial examples where you must
think in characters.


It's probably advisable to use a library regex engine than to re-write custom
regex engines all the time. Once you have a regex library that handles
codepoints, the code that uses it doesnt have to care about them in particular.

If none of these trivial string operations depend on the encoding then you
don't have to use this feature of perl, that's all. Simply make sure that
the file descriptors are not set to utf8, neither are the strings that you
concat or match to. etc, so you stay in world of pure bytes.


The problem soon as you use a library routine that is utf-8 aware, it sets
the utf-8 flag on a string and problems start to result. If there was no utf-8
flag on the scalar strings to be set, then you could stay in byte world all the
time, while still using unicode functionality where you needed it.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: perl unicode support

Reply via email to