Andries,
Currently on a Linux system you find man pages in the following encodings:
- ISO-8859-1 (German, Spanish, French, Italian, Brasilian, ...),
- ISO-8859-2 (Hungarian, Polish, ...),
- KOI8-R (Russian),
- EUC-JP (Japanese),
- UTF-8 (Vietnamese),
- ISO-8859-7, ISO-8859-9, ISO-8859-15, ISO-8859-16 (man7/*),
and none of them contains an encoding marker.
The goal is that "groff -T... -mandoc" on any man page works, without
need to specify the encoding as an argument to groff.
There are two options:
a) Recognize only UTF-8 encoded man pages. This is the simplest.
groff will be changed to emit errors when it is fed a non-UTF-8
input, so that the man page maintainers are notified that they need to
convert their man page to UTF-8.
b) Recognize the encoding according to a note in the first line
'\" -*- coding: EUC-JP -*-
groff will then emit errors when it is fed input that is non-ASCII and
without coding: marker, so that man page maintainers are notified that
they need to add the coding: marker.
Which of the two would you, as Linux man pages maintainer, prefer?
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/