Follow-up Comment #9, bug #66653 (group groff):

Hi Deri,

A few days ago I started digging into this at last.

At 2025-01-10T12:13:28-0500, Deri James wrote:
> Date: Fri 10 Jan 2025 05:13:24 PM UTC By: Deri James <deri>
> First I will explain what I am trying to accomplish, before describing
> the issue.
>
> With the demise of pdfmark.pdf and mspdf.tmac I realised that creating
> a replacement would be a lot easier if we did a similar job Branden
> did for man but this time for ms. Branden is correct that Keith's
> original pdf work is more unixy than roffish, using commands on a
> single line and various flags, and his mspdf.tmac provides new
> commands which are wrappers around existing ms commands with added pdf
> extensions. So instead of:-
>
> .NH 1
> Introduction
>
> It supports:-
>
> .NH 1
> .XN Introduction
>
> Apart from having to learn new commands it is also less flexible. Ms
> allows:-
>
> .NH 1
> Introduction to
> .I groff
>
> Which would have to be converted to inline font changes to work with
> .XN.

Right.  I've long been uneasy with this departure from what is idiomatic
for ms(7) (and to some extent mm(7)) documents.

> It also means that existing ms documents can't magically start using
> pdf features without considerable editing. The gold standard
> "solution" would be if the output was pdf then pdf features are
> automatically included, the same as Branden did with man.

It won't surprise you that that's what I think.  Getting man page
authors to adopt new habits is just shy of futile; where possible, I
think the path to success lies in making existing features "just work".

Sometimes that's a lot easier said than done; my quest to resolve this
problem has forced me to learn a lot about deep formatter internals,
including stuff that's never really been documented.  It also produced a
few struggles between us to make sense of what's really going on in
those mysterious internals.

> As a proof of concept I attempted to make .NH produce a document
> outline as well as headings. I used ms.ms as a test document,

That's an ambitious guinea pig, since ms.ms exercises most of the
package's own features.  On the bright side, if we can tame this beast,
I expect most user documents to fall to our sword as well.

> using the command
> "test-groff -Tpdf -ms -M. -mpdfms -pet -ww ../doc/ms.ms > msdj.pdf",
> the result is attached.

Looks excellent!  The only thing I expected that I didn't get was
hotspots in the document's table of contents at the end.  However, since
there's a lovely navigation bar (in a sufficiently brainful PDF viewer,
meaning NOT DROPBOX), that's an inconsequential defect.

Support for hotspotted external URLs might be more important.

That in turn might mean adding `UR` and `UE` macros to ms(7).  As I've
noted elsewhere, I am not a fan of "www.tmac"'s approach.

> Seems to like pic.ms too.

I know from hard experience that that document is good at breaking
grohtml; may gropdf fare better.

> Now onto the problem with .asciify. Because I have to use a diversion
> to capture the line(s) after the .NH the diversion contains nodes,

It is necessary to use a diversion?  The way the oldest macro packages
handled this was by storing the document's input lines into a _macro_.
I think this approach _might_ be fruitful because then you're guaranteed
not to have to deal with any node objects.

However, I haven't tried this and it might have downsides I don't
anticipate.

> fine when the diversion is output as a heading but needs to be
> converted back to text to be used for the bookmark. Traditionally this
> was done by calling .asciify which converted glyph nodes and word
> space nodes back to text:-
>
> .{type: glyph_node, character: "F", diversion level: 1},
> {type: glyph_node, character: "i", diversion level: 1},
> {type: glyph_node, character: "r", diversion level: 1},
> {type: glyph_node, character: "s", diversion level: 1},
> {type: glyph_node, character: "t", diversion level: 1},
> {type: word_space_node, diversion level: 1},
>
> But, oddly, fails to convert:-
>
> {type: glyph_node, character: "\u260E", diversion level: 1},
>
> Back to \[u260E], possibly the code was written before the advent of
> preconv.

That sounds like a bug I want to fix.

Let me see if it's live in Git today.


$ cat ATTIC/unicode-char-in-diversion.groff
.box DIV
hello \[u260E] world
.br
.box
.DIV
.asciify DIV
.DIV
\X'pdf: ignoreme \*[DIV]'
$ ./build/test-groff -bww -T pdf -a ATTIC/unicode-char-in-diversion.groff
<beginning of page>
troff: backtrace: 'ATTIC/unicode-char-in-diversion.groff':6: string 'DIV'
troff: backtrace: file 'ATTIC/unicode-char-in-diversion.groff':8
troff:ATTIC/unicode-char-in-diversion.groff:8: warning: a node is not
encodable in device-independent output
troff: backtrace: 'ATTIC/unicode-char-in-diversion.groff':6: string 'DIV'
troff: backtrace: file 'ATTIC/unicode-char-in-diversion.groff':8
troff:ATTIC/unicode-char-in-diversion.groff:8: warning: missing closing
delimiter in device extension escape sequence; expected character "'", got a
newline
hello <u260E> world hello <u260E> world '


Hmm, yup.  Looks like a bug in `asciify` all right.

While exploring the issues you've raised here, I've discovered that
`chop` is not exactly fully armed and operational.  It only "kind of"
works, but satisfies the one use case that motivated it; see bug #67453.

Solving it properly looks like a heavy lift.  GNU troff has yet more
bespoke container types for macro/string/diversion contents, `char_list`
and `node_list`, and I lack the courage to refactor these away before
the 1.24 release.

I will not be the least surprised if the issues arising thence are
entangled with bug #62264 and bug #64004.

> Now, the issues this code uncovered.
>
> 1. Near the top of the pdfms.tmac there are redefinitions of .B and
> .BI which replace the ones which are generated when s.tmac is loaded.
> If you comment out the definitions and produce the pdf there are no
> errors/warnings but some overview entries are truncated. You can see
> this in the two subsidiary entries to "Legacy Features" which both get
> truncated to one word by asciify if the s.tmac version of .I is used,
> but not if using the redefined version. One difference is that the
> s.tmac versions introduce italic correction escapes which may be what
> is upsetting asciify.

I think you're right.

> 2. The shw macro just calls pdfbookmark, so largely redundant, why not
> call pdfbookmark directly instead of shw. If you replace shw with a
> direct call to pdfbookmark, groff coredumps with an assert failure. I
> believe this is the same as a bug number we already have, but the
> reproducer started working after removal of asciify from pdf.tmac. So
> I have also attached a minimum example (it.trf) which reproduces the
> coredump if you use pdfms.tmac with the direct call to pdfbookmark
> rather than the indirection though shw.

I still need to look into this.

> This can all be put on the back burner until after the current release
> I'd just appreciate an affirmation that we agree this is the best way
> forward in replacing mspdf.tmac. I started on .NH because I think if
> we have a workable solution for that I think it is likely all pdf
> features can be slotted in without a separate api, which is what Keith
> used.
>
> Any good ideas, or better approaches, would be appreciated.

I think you're on a good directional track.  We just need to dig some
boulders out of the formatter's roadway so your changes can pass easily.

Regards,
Branden



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?66653>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to