Pablo Saratxaga wrote on another mailing list:

> On Tue, Apr 10, 2001 at 08:58:56AM +0900, Tomohiro KUBOTA wrote:
>  
> > (I am now working on "man" command for Debian GNU/Linux.  Though it
> > is clear that language for manpages must be determined by LC_MESSAGES
> > category, I have no idea how to treat LANGUAGE.)
> 
> The problem of man pages is that they don't told their encoding; that will
> be a big problem when people will start to switch to utf-8.
> IMHO it could be handled by implementing the following in each man viewer:
> 
> * determine the encoding used by the user
> * if it is not utf-8, then try to convert the man page to utf-8, if that
>   succed, display it.
> * if that fails, then the chances are high the man page is in the traditional
>   encoding; display it as is.
> * if the encoding of the user is utf-8, then don't display directly the man
>   page but start parsing it: at the first 8bit char, determine if it is
>   a valid utf-8 sequence;
> * if it is, display the page
> * if it is not, convert it (assuming the encoding is the traditional encoding
>   for that language, eg euc-jp for ja) and display it.
> 
> Yes, it's a bit complicated, but the man pages date of a time when there
> was only ascii...

Sounds complicated indeed. Why would you convert the man page itself
to UTF-8, when groff already has an option (-Tutf8) to produce UTF-8
output?

I'd suggest:
  - Assume the manpages are in traditional format. The groff
    developers will have to define how UTF-8 manpages shall define
    their encoding.
  - If the encoding used by the user (`locale charmap`) is UTF-8,
    add the option "-Tutf8" to the groff command line.

Now all that will remain to be done is to fix 'more' and 'less' to
correctly the resulting UTF-8 encoded output.

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to