On Thu, Jul 07, 2005 at 08:16:29PM -0400, srintuar wrote:

> I would advocate going one step futher, and simply
> decree than henceforth all man pages will be in NFC UTF-8, period.
> I dont see supporting multiple legacy encodings as having much value.

I'd perfectly agree with you if we designed this system from scratch.
However, we do have to care about compatibility issues.

Even mainstream English manual pages often contain the Latin-1 © (copyright)
sign, e.g. see the man pages of coreutils (cp, tail, ...). Most likely there
are many similar man pages out there in the world, and users will manually
install lots of them by simply typing "make install" and then they won't
iconv them but then they'd still expect them to be readable. However,
distributors could easily do this conversion when creating their packages so
that manpages shipped by a distro are always in UTF-8.

(The funny part is that many Hungarian man pages also contain the Latin-1
copyright symbol, whereas Latin-1 is not suitable for Hungarian accents, our
legacy encoding is Latin-2 in which copyright is substituted by some other
symbol. So these pages are neither encoded in Latin-1 nor in Latin-2, but in
a mixture of them. So this iconv'ing would'nt be sooooo easy... :-))

So temporarily, for several years, I think it's better to assume Latin-1 (or
better the legacy encoding of the language in which directory the manpage
resides in) for non-valid UTF-8 pages and issue a warning than to print some
?'s or 0+FFFD's at the non-valid UTF-8 places.

On the other hand, IMHO introducing the character set meta-information in
the files makes things overcomplicated. I don't think it's easier for anyone
to insert this piece of information than to convert the page to UTF-8. And
I'd also like to be able to create/edit/view the raw manpages in any
standard text editor with the very same system-wide default settings (I mean
UTF-8), so IMHO we should be heading towards pure UTF-8 and not to a mixture
of various encodings, neither to stupid escape sequences.



-- 
Egmont

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to