There are still some issues (input methods as you found,
localized man pages). Localized man pages are mostly in legacy encodings
and it's hard to figure out how to make them work in UTF-8 locale(if
at all possible). 'man', 'less' and 'groff' all do things differently
(when it comes to interpreting LC_* and LANG environment variables) and
they interact with each other in a intricate way. At least, I think 'man'
has to be fixed to either call setlocale(LC_MESSAGES,...) directly or
to use the SUS-provisioned order of resolving LC_*/LANG env. variables.
(i.e. 1. LC_ALL 2. LC_XXXX 3. LANG) At the moment, even 'LC_ALL=C man
xyz' doesn't give me man pages in English, let alone 'LC_MESSAGES=C'
when LANG is set to ko_KR.UTF-8. Note that LANG should be given the
lowest precedence in the locale resolution and LC_ALL should be at the
top. Certainly, man doesn't honor that order.
OK, fixed.
[If you want to improve man, then writing to a random list is
much less effective than writing to the man maintainer.
But I happened to see this.]
A couple of years ago, we discussed how to tag(if we decide
to tag them) the encoding used in man pages, but it got nowhwere. A
reasonable approach appears to be to conver them all to UTF-8 (assuming
groff UTF-8 support will come along soon).
My current source looks at a .charset file in the directory
where the man page was found.
I do not have RH 8.0. Looking at SuSE 8.1 I find that nroff is broken.
In a uxterm:
% cat /usr/share/man/man7/iso_8859-2.7 | iconv -f iso_8859-2 -t UTF-8 \
| gtbl | nroff -mandoc
yields garbage because nroff seems to assume that its input is
ISO 8859-1 and converts it to UTF-8, while in fact it was UTF-8
already.
% nroff --version
GNU nroff (groff) version 1.17.2
Has this been fixed in more recent versions?
Funny enough, it helps to give nroff the -Tlatin1 switch.
Is nroff documented to accept only latin1 input?
The nroff man page is very poor. E.g., -T is undefined.
Andries
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/