Re: Obsolete habits from the time before UTF-8 locale support

Tomohiro KUBOTA Mon, 14 May 2001 03:17:12 -0700
Hi,

At Mon, 14 May 2001 09:39:03 +0100,
Markus Kuhn <[EMAIL PROTECTED]> wrote:

>   - Do NOT define the environment variable LESSCHARSET. The current
>     version of less is able to test the locale and determines automatically
>     whether you want Latin-1 or UTF-8. Defining LESSCHARSET is a habit from
>     the time where less would by default assume everything is in ASCII,
>     which it does not any more. Defining LESSCHARSET deactivates the
>     automatic locale-based selection of the character encoding.

Sure.  However, we may notice to the fact that less has to deal with
two encodings - encoding for file input and encoding for terminal output.
Though encoding for terminal output can be fully determined by LC_CTYPE
locale, encoding for file input may be different from LC_CTYPE encoding.
I mean, less with Japanese patch has a nice feature to automatically
tell which Japanese local encoding (EUC-JP, Shift_JIS, or ISO-2022-JP)
the input file uses.  Ok, the LC_CTYPE encoding should be the first
candidate in encoding guessing.  However, it may be nice to have some
additional hint information.


>   - Do NOT use the option -u8 in xterm. This option was a temporary hack
>     from the time before we had UTF-8 locales supported by glibc. It is
>     now possible to set LANG=en_GB.UTF-8 (or whatever) without causing
>     every application that calls setlocale(LC_ALL, "") to spit out an
>     error message. Setting the locale to one that uses UTF-8 and then
>     starting xterm without option -u8 is the correct way of starting a
>     UTF-8 xterm. The will ensure that applications find inside a UTF-8
>     xterm always also a UTF-8 locale.

I fully agree.  Using UTF-8 in non-UTF-8 locale is completely wrong,
just as using ISO-8859-1 in non-ISO-8859-1 locale is wrong.


>     With the next version of xterm, it will also not be necessary any more
>     to specify an iso10646-1 font when you use UTF-8 mode. There will be
>     separate font resources such that you can specify fonts for both the
>     8-bit and the 16-bit mode in your ~/.Xdefaults file

Very good.  BTW, what is 16-bit mode?
(In Robert's and my patch, xterm has four modes of -8, -u8, -lc,
and -en.  If your "8-bit mode" means "-8" and "16-bit mode" means
other three modes, I understand what you said.  That implies that
Robert's and my patch will be integrated in "the next version of
xterm".  Is it right?  If yes, it is a very good news!)


>   - Do NOT use groff -Tlatin1 anywhere (e.g., in /etc/man.config). Instead
>     use nroff, which is now a shell script that tests the locale (using
>     "locale charmap" and then calls groff with the suitable -T option.
>     This will cause nroff and man to work automatically correctly in both
>     UTF-8 and Latin-1 locales.

I think internationalization of groff is not yet available.  However,
I agree with your idea that -T should not relate to encoding.  I think
-Tutf8 mode in current version of groff is confusing because -Tutf8
means UTF-8 _output_ and assumes ISO-8859-1 _input_  (I may be wrong;
input and output might be opposite).

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/


-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Obsolete habits from the time before UTF-8 locale support

Reply via email to