On Thu, Jul 07, 2005 at 08:16:29PM -0400, srintuar wrote: > I would advocate going one step futher, and simply > decree than henceforth all man pages will be in NFC UTF-8, period. > I dont see supporting multiple legacy encodings as having much value.
I'd perfectly agree with you if we designed this system from scratch. However, we do have to care about compatibility issues. Even mainstream English manual pages often contain the Latin-1 © (copyright) sign, e.g. see the man pages of coreutils (cp, tail, ...). Most likely there are many similar man pages out there in the world, and users will manually install lots of them by simply typing "make install" and then they won't iconv them but then they'd still expect them to be readable. However, distributors could easily do this conversion when creating their packages so that manpages shipped by a distro are always in UTF-8. (The funny part is that many Hungarian man pages also contain the Latin-1 copyright symbol, whereas Latin-1 is not suitable for Hungarian accents, our legacy encoding is Latin-2 in which copyright is substituted by some other symbol. So these pages are neither encoded in Latin-1 nor in Latin-2, but in a mixture of them. So this iconv'ing would'nt be sooooo easy... :-)) So temporarily, for several years, I think it's better to assume Latin-1 (or better the legacy encoding of the language in which directory the manpage resides in) for non-valid UTF-8 pages and issue a warning than to print some ?'s or 0+FFFD's at the non-valid UTF-8 places. On the other hand, IMHO introducing the character set meta-information in the files makes things overcomplicated. I don't think it's easier for anyone to insert this piece of information than to convert the page to UTF-8. And I'd also like to be able to create/edit/view the raw manpages in any standard text editor with the very same system-wide default settings (I mean UTF-8), so IMHO we should be heading towards pure UTF-8 and not to a mixture of various encodings, neither to stupid escape sequences. -- Egmont -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
