That would be contradictory to the whole concept of Unicode. A human-readable string should never be considered an array of bytes, it is an array of characters!
Hrm, that statement I think I would object to. For the overwhelming vast majority of programs, strings are simply arrays of bytes. (regardless of encoding) The only time source code needs to care about characters is when it has to layout or format them for display. If perl did not have a "utf-8" bit on its scalars, it would probably handle utf-8 alot better and more naturally, imo. Functions and routines which need to know the printable charcell width, or the how to lookup glyph's in a font could easily parse the codepoints out of the array based on either the locale encoding, or by simply assuming utf-8 (as is increasily preferable, imo) then perform the appropriate formatting lookups. Aside from that tiny handful of libraries, noone else should have to bother with encoding, imo. (regular expressions supporting utf-8 is useful as wel) When I write a basic little perl script that reads in lines from a file, does trivial string operations on them, then prints them back out, there should be absolutely no need for my code to make any special considerations for encoding. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
