I think I see my problem, I was looking for something that's designed
to allow 'THE SYSTEM' to become UTF-8 (ie UTF8 Filenames and file data)
but still allow external interfaces to be controlled (at a character
set level) by the locale settings.

It unfortunatly appears that using the standard locale routines there are
going to have to be three seperate representations of every string that
ever hits the disk:

  1) The Locale string, eventually this may become UTF-8
  2) Wide characters
  3) UTF-8 for interchange on the filesystem.

And the worst thing is that this will continue forever!

This strikes me as a royal pain, especially as (in theroy) even 'cat'
and similar programs that can cross a locale boundry should do this
double conversion despite the fact that there's no way they can tell
when they _must_ do the conversion.

As all this conversion is essentially impossible without character set
metadata on every string there's really no option except to have 'THE
SYSTEM' running with only one character set, this includes not only the
machine itself but each and every display that _might_ be connected.

I suppose it's too late now to fix this ... you know set it up so that
every program that knows it's connecting to a display device (eg uses
readline, curses, slang, ifhp etc) does the character set conversion
along with all it's other terminal control jobs.

NB: 'ls' too as it already knows when it's output is a terminal.

That way when external devices are UTF-8 the nul conversion takes no time
and strings are just strings once again; no metadata required.

OTOH, perhaps it's not too late, a variable LC_HOSTCTYPE would be able
to provide the required information and as soon as the locale library
knows of it's existance it can tell any locale sensitive program that
chars >127 are unprintable (without conversion).

-- 
Rob.                          (Robert de Bath <robert$ @ debath.co.uk>)
                                       <http://www.cix.co.uk/~mayday>







--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to