That would be contradictory to the whole concept of Unicode. A
human-readable string should never be considered an array of bytes, it is an
array of characters!

Hrm, that statement I think I would object to. For the overwhelming
vast majority of programs, strings are simply arrays of bytes.
(regardless of encoding) The only time source code needs to care about
characters is when it has to layout or format them for display.

If perl did not have a "utf-8" bit on its scalars, it would probably
handle utf-8 alot better and more naturally, imo.

Functions and routines which need to know the printable charcell
width, or the how to lookup glyph's in a font could easily parse the
codepoints out of the array based on either the locale encoding, or by
simply assuming utf-8 (as is increasily preferable, imo) then perform
the appropriate formatting lookups.

Aside from that tiny handful of libraries, noone else should have to
bother with encoding, imo.  (regular expressions supporting utf-8 is
useful as wel)

When I write a basic little perl script that reads in lines from a
file, does trivial string operations on them, then prints them back
out, there should be absolutely no need for my code to make any
special considerations for encoding.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to