Follow-up Comment #2, bug #58796 (project groff): Hi Dave,
> a bit of a hack Not so much, actually. Making good use of pipes is among the design principles of the whole roff ecosystem, to harmonize with the overall UNIX design philosophy that every tool should solve one task only, but solve it well and in a way that facilitates combination with the other tools. In this sense, groff is actually more UNIXy than mandoc, which does integrate preconv. > wrapper for iconv I would hate it if groff would start requiring iconv. I consider it an important asset that so far, it does not. > the language has standard libraries to handle UTF-8 Yes, indeed the C language contains a vast array of C library functions to deal with wide characters and with multibyte characters. But the design of these C libary facilities is atrocious, and using something else which is non-standard would even be worse. Either way, rewriting a program to natively support wide characters is usually an extremely tedious, extremely intrusive, very time-consuming and highly error-prone task. Even when done as designed, it adds horrible complication to the code and makes the code much more fragile. For samll programs, ways exist to cheat one's way around these notorious downsides, see my presentation at EuroBSDCon in Beograd a few years ago. But i doubt something like that could be pulled off for a program as large as groff, at least not easily. > not sure why preconv need emit things like \['e] or \[u00E9] at all Because single-byte 8-bit locales have been obsolete for many years and some operating systems don't even support them any longer. And even for people using Linux: almost nobody uses LC_CTYPE=*.Latin-1 nowadays, which would imply that you could no longer look at the preconv output with a pager. When you do groff-specific encoding anyway, it's much better to encode all non-ASCII characters and not force users to adopt an obsolete locale. While in general, i hate adding options to programs, in particular when it can be expected that they will be used rarely, i do see that an occasional need for what Brandon asks for might arise. When picking new options, please don't forget to look at https://mandoc.bsd.lv/man/man.options.1.html - the groff/man option space is seriously crowded already, and having several programs in a single package or in two very closely related packages that all use the same option letter but each one for a different purpose isn't user-friendly at all. Either way, i would judge this task as somewhat low-priority because the situation that you want to maintain the document source in US-ASCII (which implies there are only occasional non-ASCII characters in it, otherwise you would surely maintain the document source in UTF-8 in the first place) yet that there is a sufficient number of stray wide characters inside that you want to encode them automatically rather than just manually fixing them one by one may occasionally occur, but not all that often, i think. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?58796> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
