As an experiment, I tried building groff from source (from the git repo) after converting all Latin-1 files to UTF8.
The build appeared to succeed, but there were about 9000 lines of diagnostics about invalid input characters. So obviously a naive approach isn't going to work. Apparently groff doesn't do well with UTF-8 input. I'd like to see that changed, but I don't know nearly enough about groff to even start that work, or to speculate about whether it would be a good idea. Meanwhile, I suggest converting only files that are treated as plain text (NEWS, ChangeLog.*, */README, etc.), just to make things a bit easier for human readers. Thoughts? On Fri, Apr 10, 2026 at 11:53 AM Keith Thompson <[email protected]> wrote: > > There are a number of Latin-1 (ISO 8859-1) files in the groff source > distribution. > > I suggest that it would be better for most or all of these to be converted to > UTF-8. On my system, these files do not display correctly, since I have my > system configured to use UTF-8 by default. I think most people are likely to > be in a similar situation. > > For example, line 56 of the NEWS file appears on my system as: > > `WE` no longer re�nable it. This change makes groff mm consistent > > Converting to UTF-8, it appears as: > > `WE` no longer reënable it. This change makes groff mm consistent > > If there's a consensus that these files should be converted to UTF-8, I > volunteer to submit a patch. > > I haven't closely examined all the relevant files, but for example I'm not > sure what to do about tmac/fr.tmac lines 160-174. (The same file also has a > Latin-1 accented letter in a comment on line 5.) > > -- Keith Thompson
