Joshua Slive <[EMAIL PROTECTED]> writes: > > Japanese manpages encoding is usually euc-jp. I don't > > know Joshua's program can handle it (if it is 8bit clean, > > probably it is OK). If program runs fine, then you have to > > convert it to iso-2022-jp. > > I don't really know either. It uses mostly c-lib string manipulation, but > there are a couple funky things to handle overstrike/bold, etc. I am > attaching the code to this message. (It is very ugly, but it basically > works. See the comments at the top for usage.)
It works, but output is ugly because nroff adds many spaces to fill paragraphs. It might be a good idea to define subset of nroff commands as XML DTD and write new converter. All apache man pages uses basic set of man macro and nroff command, so bypassing nroff will be a huge win for a converter. If majority agrees, I volunteer to define DTD and implement converter. > Is there any standard for how operating system vendors and other handle > the issue of multi-lingual manual pages? Not that I know of. These are the location of Japanese man pages. Debian: /usr/share/man/ja Kondara: /usr/man/ja Laser5: /usr/man/ja, /usr/man/ja_JP.ujis/ Solaris 8: /usr/man/ja, /usr/man/ja_JP.PCK/, /usr/man/japanese All of them except Solaris uses euc-jp as a character encoding scheme. Solaris 8 uses two encodings euc-jp for /usr/man/ja, shift_jis for /usr/man/ja_JP.PCK/. /usr/man/japanese is a symlink to /usr/man/ja I've heard Kondara might use UTF-8 in future version. I don't have access to other systems with Japanese man pages installed. It seems like using euc-jp and forget about installation is the way to go. -- Yoshiki Hayashi