On Wed, Jan 15, 2003 at 04:41:57PM +0900, Junichi Uekawa wrote: > > > Not all of the statements made in that thread are not quite true, > > > and I seem to remember seeing some hacks done by Ukai-san on that > > > respect, for UTF-8. > > > > Hmmm...could you elaborate? > > I think our man-db and groff have been hacked in two ways: > > 1) to special-case japanese locale (ja_JP.eucJP) and > act specially in that case only (using -Tnippon device) > > 2) to work with utf-8
2) is present in groff upstream, actually, but 1) interferes with it in some exciting ways. We can probably manage to patch it up so that UTF-8 doesn't break quite so badly, but really it's almost impossible to get completely correct output in all encodings from current groff, which has historically had a hard-coded expectation of ISO-8859-1 input that reaches quite deeply into its design. There is no (standard) way for a document to state its encoding. groff 2.0 is planned to fix this by, among other things, changing its input encoding expectation to be UTF-8 instead, but that's some way off yet. man has a big table of language directories and what groff output devices are conventional in each. It's clearly not exactly ideal, but it's the best we've got for now. I think it is undeniably true that the man-db/groff toolchain is not yet ready for Debian policy to mandate UTF-8. > I seem to remember 1 was the case in potato, or woody, breaking > use under ja_JP.utf-8. ja_JP.UTF-8 may be hackable in man nowadays; please send patches if you can get it to work. :) > I think Colin Watson should know better about the status... I can supply pointers, but Fumitoshi UKAI is the real expert on groff encodings. -- Colin Watson [EMAIL PROTECTED]

