Tomohiro KUBOTA wrote on 2001-04-10 13:59 UTC:
> However, I don't think it is a good idea to specify encoding
> using "-T" option. "-T" is used for device type, which is
> independent concept from encoding. I suggest "-Ttty".
> In default, output encoding is LC_CTYPE. If needed, output
> encoding can be specified using a new command option.
Fully agreed here.
I'd also like to repeat my previous suggestion to simply fix for a
distribution such as Debian that all man pages have to be installed in
UTF-8 and nothing else. This saves us the trouble of adding both a
character encoding id mechanism and a character coding conversion
mechanism to the input of groff.
Distribution maintainers can quite easily maintain such a policy. They
just have to send non-ASCII/non-UTF-8 man pages once through iconv
before packaging them. End users who install third-party non-ASCII/
non-UTF-8 man pages (pretty few people do that actually), also have to
apply iconv once to the installed man page, if it turns out that it is
not already in UTF-8 anyway. If someone forgot doing it, groff will spot
malformed UTF-8 sequences and should abort with a message like:
ERROR: the source document /usr/local/man/man4/whatever.4
contains non-ASCII characters but is apparently not UTF-8 encoded.
SOLUTION: Please determine the original encoding of this source
document and use "iconv" or a similar recoding tool to convert
it to UTF-8.
That truely sounds far simpler to me than expecting that all people add
some yet to be invented character set marker to the beginning of every
man page. If you think about it, adding a character set identifier has
no advantage over just converting everything to UTF-8. Both change the
file and require someone manually to determine the encoding.
It would be really nice, if groff supported such an installation policy
as an option. For example
groff --utf8-input -Ttty
should cause groff to accept only UTF-8 input files (man page source
file encoding becomes hardwired and not locale dependent here) and then
as Tomohiro suggested, LC_CTYPE will determines the output encoding
according to the user's locale.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/