[bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command

Deri James Thu, 05 Feb 2026 08:46:57 -0800

Follow-up Comment #36, bug #64360 (group groff):

On Thursday, 5 February 2026 02:55:02 GMT G. Branden Robinson wrote:
> Follow-up Comment #35, bug #64360 (group groff):
>
> [comment #34 comment #34:]
>
>> Discussion about this issue has temporarily jumped ship to bug #67992.
>> Hopefully it finds its way back here, as it has fsck-wall to do with
>> #67992's topic.
>
> Not for very long.  It pretty much jumped back.  Bug #67992 still mostly has
> to do with the issue I just posted to the _groff_ list about.
>
> https://lists.gnu.org/archive/html/groff/2026-02/msg00025.html
>
> As noted there, it's now a documentation issue.
>
> _This_ issue is still awaiting Deri's feedback.
>
> Here's the part that jumped (from comment 15 to bug #67992).  The material
> not prefixed with ">" is me.
>
>> Follow-up Comment #14, bug #67992 (group groff):
>>
>> On Sunday, 1 February 2026 04:51:11 GMT G. Branden Robinson wrote:
>>> Follow-up Comment #11, bug #67992 (group groff):
>> [...]
>>
>>> Deri, for example, is rigid in his expectations of GNU troff's
>>> output format ("grout", as I term it.)  Where documentation doesn't
>>> support his rigidity, he points to the implementation as a
>>> specification from which no deviation should be permitted; see bug
>>> #63544.
>>
>> I provided a one line perler which achieved your desire for more
>> readable grout (the "wish" of that bug):-
>>
>> perl -pe 's/(.)(.*)/$1\n$2/ if m/^w/; s/^(.)(\S.*)/$1 $2/mg' zfile
>>
>> I hope you are still using it.
>
> I am not.  I do not see the value in filtering GNU troff's output to
> pre-digest it into a form gropdf requires but which no other groff
> output drivers do.


I see your misunderstanding of the purpose of the perl one liner. The wish bug

was to produce grout more readable by humans, which is not a benefit to most
users, but may be a help for developers searching for bugs. So I wrote the
perl one liner so that a human would see the grout as you would like to see it

without actually changing the current format of the grout. It is not intended
as a filter for gropdf, merely a simple way of achieving the purported reason
for the ticket. I was hoping that would be sufficient to close the ticket.

Since it is difficult to compare versions of pdfs, when a visual difference is

spotted,  the easiest way of spotting changes between the versions is to
compare the two grout files. If the format of the grout file is altered
between versions it would make comparison more difficult. Once the two grouts
are confirmed compatible by diffing them, any visual difference in the pdf
must be due to changes I have introduced in gropdf.

> See:
> https://cgit.git. ... s.sh?h=1.24.0.rc2
>
> (Whoops--stale comment in there.  I'll fix that.)

I'm confused,
>
>> I am expecting changes to grout and groff fonts (v2) when you complete
>> full utf8 throughput.
>
> I don't plan "full UTF-8 throughput".

I was being too succinct to be understandable!

Input is UTF8 (and may include groff named characters, eg \[cq], \[u03B9]).

Each character is then converted to its Unicode Code Point (UCP) char32_t
(good choice) and becomes a text node (I'd also store the actual input utf8 or

groff char name as a wchar_t string in the text node as well - makes asciify a

doddle). This  also means that only text nodes carry document stream text.

> I don't plan to use UTF-8 for GNU troff's internal storage of
> formattable character objects (rather, I plan to use `char32_t`) and
> moreover, even I did use UTF-8 internally, the input would still have to
> undergo some form of Unicode normalization; probably Normalization Form
> D given the program's orientation toward typesetting and support for
> glyph composition by overstriking.  groff_char(7) has cautioned the
> reader to prepare for normalization of input for several years.

NFD is used for searching and sorting INPUT characters, it specifically has
nothing to do with typesetting. This ticket:-

https://savannah.gnu.org/bugs/?67244

(which I can't access at the moment, so I can't check the current state of
play). Documents an issue with using any unicode normalisation forms. The
problem, in this case involves characters which have different forms depending

on context. This is a bit like ligatures, 'f' has a UCP but when it is
followed by another 'f' a different UCP is appropriate IF  the font being used

supports both UCPs (f=0066, ff=FB00). If you are searching text for 'ff' you
would hope to find both 'ff' and 'ﬀ'. The same is true for the iota
character,
it has different forms depending on context, but for searching/sorting, which
normalisation  is concerned with, both forms mean iota so they are mrked as
equivalent input characters, but groff is concerned about the output
typesetting (for grops/gropdf at least), but the difference in output form is
lost.


> Expecting an identity mappingﬀ between UTF-8 code points on input and the
> glyph encoding on output is unrealistic.
>


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64360>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #64360] [PATCH] [gropdf] does not correctly handle white space after 'w' command

Reply via email to