Re: groff, man and Unicode

Andries . Brouwer Mon, 09 Dec 2002 11:33:37 -0800

    From [EMAIL PROTECTED]  Mon Dec  9 18:46:01 2002

    My personal opinion:


      - There are basically two options for determining the input encoding
        of groff, and they are not mutually exclusive:

        a) Man somehow "knows" (e.g. from a config file that lists the
           character encoding based on a per-subdirectory basis) what
           character encoding which man page is in and simply tells
           groff what its input character encoding is via a (to be added)
           command-line option like "-eUTF-8".

What I have now is: Man somehow "knows" (e.g. from the file .charset
near the man page) what character encoding a man page is in, and also
(e.g. from the users locale) what character encoding is desired.
It does an iconv from man encoding to desired encoding and feeds
that to nroff.

      - Man should never convert the encoding of man pages, because
        where two programs in a pipeline recode characters, this promises to
        hide and obscure problems later in difficult to understand ways.

It is clear that there must be a recode somewhere.
It is not clear to me that it would be preferable to do this in nroff.
One of the advantages of iconv in man is that it works today, also
with old *roff. I am unhappy with recode in groff.
Always when programs fiddle with one's bits one has to struggle
to tell them to keep their hands off.

The Unix philosophy: iconv does conversion, groff does formatting.

      - groff really should scrap the character encoding variants 
        (ascii, ascii8, latin1, utf8, cp1047, nippon, etc.) from the -T
        option. The -T option should switch between ps, dvi, ..., html
        and text.

I agree completely.

        The new "text" option outputs plaintext (so far called ascii), and the
        locale setting (or if really necessary a new command line option
        "-EISO-8859-1" or so to override the locale) defines the encoding
        of this plaintext output. The output format (ps, text, html) and the
        encoding used must be handled completely orthogonally (i.e., use
        different command line options), because both the text and html
        output format could use different encodings. You can keep "-Tlatin1"
        as a backwards compatible hack for "-Ttext -EISO-8859-1", etc.
        of course.

I like a groff that has by default output in the same character set
as input. Of course it needs to know whether the input is in an 8-bit
encoding or something more complicated, but in the common case of
8-bit encoding and plain text output it may not even be necessary
to know anything about the character set. Thus, things would
"just work" with ISO 8859-2 or KOI-8U even when the user does not
set any locale.

The system you propose sounds more fragile.

Andries
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: groff, man and Unicode

Reply via email to