I'd like to have in some standard tool a function that looks at
nl_langinfo(CODESET) and then sends out the corresponding ISO 2022
sequence to make sure the terminal knaows about the encoding of the
current locale. This way, setting LANG in .profile followed by calling
this tool will make sure your terminal/console/screen/condom/etc. gets
the message as well.
Which tool in which package should best be extended with that? "reset"
(ncurses), "tset", "stty"? Perhaps this should actually go in .bashrc
into the bash prompt, such that at each display of the prompt, the ISO
2022 state of the terminal is reset to what it should be according to
the locale?
Possible ISO 2022 / ECMA-35 / ECMA-43 sequences for that:
The ISO 2022 ESC sequences for setting the G0 characters (33-127) to
US-ASCII: "\033(B"
The ISO 2022 ESC sequences for setting the G1 characters (160-255) to
various charset standards:
ISO 8859-1 "\033-A"
ISO 8859-2 "\033-B"
ISO 8859-3 "\033-C"
ISO 8859-4 "\033-D"
ISO 8859-5 "\033-L"
ISO 8859-6 "\033-G"
ISO 8859-7 "\033-F"
ISO 8859-8 "\033-H"
ISO 8859-9 "\033-M"
ISO 8859-10 "\033-V"
ISO 8859-13 "\033-Y"
ISO 8859-14 "\033-_"
ISO 8859-15 "\033-b"
The ISO 2022 ESC sequences for
switch to UTF-8 "\033%G"
back from UTF-8 "\033%@"
References:
http://www.cl.cam.ac.uk/~mgk25/unicode.html#term
http://www.itscj.ipsj.or.jp/ISO-IR/
http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM
http://www.itscj.ipsj.or.jp/ISO-IR/203.pdf
So if nl_langinfo(CODESET) = "ISO-8859-15", the tools should output
"\033%@\033(B\033-b", whereas for "UTF-8" it should just say "\033%G".
That's easy to add to almost anything that resets or configures a
terminal.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/