Pablo Saratxaga wrote on another mailing list:
> On Tue, Apr 10, 2001 at 08:58:56AM +0900, Tomohiro KUBOTA wrote:
>
> > (I am now working on "man" command for Debian GNU/Linux. Though it
> > is clear that language for manpages must be determined by LC_MESSAGES
> > category, I have no idea how to treat LANGUAGE.)
>
> The problem of man pages is that they don't told their encoding; that will
> be a big problem when people will start to switch to utf-8.
> IMHO it could be handled by implementing the following in each man viewer:
>
> * determine the encoding used by the user
> * if it is not utf-8, then try to convert the man page to utf-8, if that
> succed, display it.
> * if that fails, then the chances are high the man page is in the traditional
> encoding; display it as is.
> * if the encoding of the user is utf-8, then don't display directly the man
> page but start parsing it: at the first 8bit char, determine if it is
> a valid utf-8 sequence;
> * if it is, display the page
> * if it is not, convert it (assuming the encoding is the traditional encoding
> for that language, eg euc-jp for ja) and display it.
>
> Yes, it's a bit complicated, but the man pages date of a time when there
> was only ascii...
Sounds complicated indeed. Why would you convert the man page itself
to UTF-8, when groff already has an option (-Tutf8) to produce UTF-8
output?
I'd suggest:
- Assume the manpages are in traditional format. The groff
developers will have to define how UTF-8 manpages shall define
their encoding.
- If the encoding used by the user (`locale charmap`) is UTF-8,
add the option "-Tutf8" to the groff command line.
Now all that will remain to be done is to fix 'more' and 'less' to
correctly the resulting UTF-8 encoded output.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/