But in most of the cases you have to _think_ in characters, otherwise it's quite unlikely that your application will work correctly.
I'm not quite sure how "thinking in characters" helps an application, in general. I'd be interested if you had a concrete example...
> The only time source code needs to care about > characters is when it has to layout or format them for display. No, there are many more situations. Even if your job is so simple that you only have to convert a text to uppercase, you already have to know what encoding (and actually what locale) is being used.
Thinking in characters for that: such as calling a function like "toupper" is broken. There is no guarantee that case folding will maintain a 1 to 1 mapping of unicode codepoints. Here you are better off working with whole strings, and when doing so you don't have to think in characters or codepoints at all.
Finding a particular letter (especially in case insentitive mode), performing regexp matching, alphabetical sorting etc. are just a few trivial examples where you must think in characters.
It's probably advisable to use a library regex engine than to re-write custom regex engines all the time. Once you have a regex library that handles codepoints, the code that uses it doesnt have to care about them in particular.
If none of these trivial string operations depend on the encoding then you don't have to use this feature of perl, that's all. Simply make sure that the file descriptors are not set to utf8, neither are the strings that you concat or match to. etc, so you stay in world of pure bytes.
The problem soon as you use a library routine that is utf-8 aware, it sets the utf-8 flag on a string and problems start to result. If there was no utf-8 flag on the scalar strings to be set, then you could stay in byte world all the time, while still using unicode functionality where you needed it. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/