Hi,
At Mon, 14 May 2001 09:39:03 +0100,
Markus Kuhn <[EMAIL PROTECTED]> wrote:
> - Do NOT define the environment variable LESSCHARSET. The current
> version of less is able to test the locale and determines automatically
> whether you want Latin-1 or UTF-8. Defining LESSCHARSET is a habit from
> the time where less would by default assume everything is in ASCII,
> which it does not any more. Defining LESSCHARSET deactivates the
> automatic locale-based selection of the character encoding.
Sure. However, we may notice to the fact that less has to deal with
two encodings - encoding for file input and encoding for terminal output.
Though encoding for terminal output can be fully determined by LC_CTYPE
locale, encoding for file input may be different from LC_CTYPE encoding.
I mean, less with Japanese patch has a nice feature to automatically
tell which Japanese local encoding (EUC-JP, Shift_JIS, or ISO-2022-JP)
the input file uses. Ok, the LC_CTYPE encoding should be the first
candidate in encoding guessing. However, it may be nice to have some
additional hint information.
> - Do NOT use the option -u8 in xterm. This option was a temporary hack
> from the time before we had UTF-8 locales supported by glibc. It is
> now possible to set LANG=en_GB.UTF-8 (or whatever) without causing
> every application that calls setlocale(LC_ALL, "") to spit out an
> error message. Setting the locale to one that uses UTF-8 and then
> starting xterm without option -u8 is the correct way of starting a
> UTF-8 xterm. The will ensure that applications find inside a UTF-8
> xterm always also a UTF-8 locale.
I fully agree. Using UTF-8 in non-UTF-8 locale is completely wrong,
just as using ISO-8859-1 in non-ISO-8859-1 locale is wrong.
> With the next version of xterm, it will also not be necessary any more
> to specify an iso10646-1 font when you use UTF-8 mode. There will be
> separate font resources such that you can specify fonts for both the
> 8-bit and the 16-bit mode in your ~/.Xdefaults file
Very good. BTW, what is 16-bit mode?
(In Robert's and my patch, xterm has four modes of -8, -u8, -lc,
and -en. If your "8-bit mode" means "-8" and "16-bit mode" means
other three modes, I understand what you said. That implies that
Robert's and my patch will be integrated in "the next version of
xterm". Is it right? If yes, it is a very good news!)
> - Do NOT use groff -Tlatin1 anywhere (e.g., in /etc/man.config). Instead
> use nroff, which is now a shell script that tests the locale (using
> "locale charmap" and then calls groff with the suitable -T option.
> This will cause nroff and man to work automatically correctly in both
> UTF-8 and Latin-1 locales.
I think internationalization of groff is not yet available. However,
I agree with your idea that -T should not relate to encoding. I think
-Tutf8 mode in current version of groff is confusing because -Tutf8
means UTF-8 _output_ and assumes ISO-8859-1 _input_ (I may be wrong;
input and output might be opposite).
---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/