[Please honour my Mail-Followup-To: header.] On Wed, Mar 31, 2004 at 04:41:44PM +0300, Martin-Éric Racine wrote: > On Wed, 31 Mar 2004, Colin Watson wrote: > > On Wed, Mar 31, 2004 at 09:05:04AM +0300, Martin-Éric Racine wrote: > > > Heck, if you ask me, Sarge should be known as the "we upgrade everyone > > > to UTF-8" Debian release. This would imply that absolutely every > > > package to be released in Sarge would know about legacy encodings for > > > each locale and be able to recode every config file, man page, > > > > Not possible. UTF-8 man pages are not yet supported by groff, and won't > > be until groff 2.0. > > Do you envision this as being possible for Sarge+1 then?
That entirely depends on when it gets implemented in groff. It's something upstream is working on and something on which there's been some incremental progress in recent versions of groff, but it's fundamentally hard and will probably involve incompatible changes. > At this point, it seems that all Debian-specific tools either default to > UTF-8 or can handle UTF-8, so it doesn't seem like such a difficult goal. I don't think you'd say that if you knew more about groff internals. The assumption of ISO-8859-* runs deep (chiefly ISO-8859-1 - it's only recently that decent support for ISO-8859-2 and ISO-8859-9 was added), and it needs quite a few internal changes to remove that assumption. It is not at all a simple matter of adding calls to iconv, since groff needs to know more about the text than that. The extensive Debian patch to groff manages to support Japanese and possibly other CJK languages, but that patch is so extensive that I've been unable to update it to groff 1.19. man (mostly) supports you running in a UTF-8 locale by means of some complicated iconv kludges. It will *not* generally support UTF-8 in source man pages until groff upstream supports it. While it would be possible to shove another iconv in at the start (and in fact this is done for ja_JP.UTF-8 for evil reasons), I don't want to put myself in the position of having to convert the world twice in the event that groff upstream do the transition slightly differently from the way I'd do it. I think UTF-8 is the future, use it fairly extensively myself, and will continue to work towards having it everywhere, but let's not make the mistake Red Hat made of pushing UTF-8 beyond the current capabilities of our software. Cheers, -- Colin Watson [EMAIL PROTECTED]

