Guillem Jover <[email protected]> writes: > Ah right, indeed it does. And it's explained in that same man page I > referred. O:) The escape sequence would be something like \[u0021] or > \[u0041_0300].
Oh! So, if I can just convert all Unicode characters to their numeric codes, this becomes very easy to do. No tables and other machinery required. I'm a little worried about the \[u0041_0300] form, though. Does that mean that \[u0041]\[u0300] does not work, and Pod::Man has to know whether characters are combining or not? I suppose that's possible with the Perl Unicode support, if necessary. Are the numbers there the hex digits of a Unicode code point? The groff_char man page is maddeningly light on details about this escape form, mentioning it only in a REFERENCE section. >> For Pod::Man usage, the output format I'd want would be a hash mapping >> Unicode code points to the correct groff escape. Or, in an absolutely >> ideal world, to have an Encode encoding for groff escapes, similar to how >> the Encode::MIME::Header encoding works to generate RFC 2047 strings. > I happened to stumble over an old patch by Brendan O'Dea that might be > helpful, including a reference here to not lose track of that: > > <https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=442066;filename=groff-utf8;msg=22> Oh, aha, that's basically the table I was looking for, although that's very limited compared to all Unicode characters, so it seems easier to just do a straight conversion to the \[uNNNN] form. >> B<> and I<> could just be surrounding normal words that should use >> normal hyphens. L<some-command> is a link to a section in the same >> document entitled some-command, so the assumption there is also that it >> could be a regular English word. > Oh, at least perlpod(1) says that L<name> links to a Perl manual page, > so I'd expect it to be equivalent to the L<crontab(5)> style when > processing minus chars, and L</sec> does the inter-section linking? Oh, sorry, yes, I was thinking of L</some-command>. So the idea is that L<some-command> should always use \- for all embedded hyphens? >> As you say, though, I'm not entirely sure the distinction is worth all >> the trouble we've put into it over the years. nroff at least seems to >> have just given up and maps them all to "-" in the output anyway. That >> used to be a Debian-specific change, but it looks like upstream has >> switched to treating - as \-, I think? For HTML output, upstream maps >> \- to − and Debian still overrides that to - instead. (If >> upstream thinks \- is a minus sign and not ASCII 45, I'm really >> confused what's going on with this, though.) > We should probably ask Colin about this. :) Yes, please -- Colin, do you have any idea what the current best practice is here? I'm trying to figure out what to have Pod::Man do. -- Russ Allbery ([email protected]) <http://www.eyrie.org/~eagle/>

