Is there any best-practice information on how to use groff with Eastern European languages/characters? In particular, best practices for authoring man pages?
I'd especially like to know what guidance there might be on what to do for code points that don't have named characters/escapes in groff (ű, ő, ż, ă, ą, ā, ș, ć, č, etc.) -- basically, for Latin-2 I guess (or Unicode Latin extended-A). Do authors writing content in Eastern European languages generally use those characters as-is in their groff source, or do they use escape sequences to do overstrikes to compose them? The context for this question is that I'd like to know what would be best for the DocBook manpages stylesheet to generate for those languages. For Latin-1/Western European languages, the stylesheet converts the any non-Roman/accented characters to their corresponding groff named-characters/escapes. And it does the same for a whole bunch of symbols also (not just letters). So even if a user has kept the output encoding for the stylesheet at its default value (UTF-8), in generated man page for most of those languages, there will generally only be ASCII characters. But for Eastern European languages, if the user has UTF-8 source and keeps the output encoding for the stylesheet set at its default value, any UTF-8 characters in the source that don't have named characters in groff are passed through as-is. What happens with that man page after that depends I guess on what system(s) it ends up on. I know on my Debian system, the installed man-page files all seem to be encoded in UTF-8, and the backend for the man command converts those on the fly (I suppose by calling iconv or something to do it). But I'm not sure if users on other systems can depend on something like that. Anyway, any guidance/suggestions on what would be best to have the stylesheet generating for Eastern European languages would be appreciated. --Mike -- Michael(tm) Smith http://people.w3.org/mike/ http://sideshowbarker.net/
smime.p7s
Description: S/MIME cryptographic signature
