Follow-up Comment #9, bug #66653 (group groff): Hi Deri,
A few days ago I started digging into this at last. At 2025-01-10T12:13:28-0500, Deri James wrote: > Date: Fri 10 Jan 2025 05:13:24 PM UTC By: Deri James <deri> > First I will explain what I am trying to accomplish, before describing > the issue. > > With the demise of pdfmark.pdf and mspdf.tmac I realised that creating > a replacement would be a lot easier if we did a similar job Branden > did for man but this time for ms. Branden is correct that Keith's > original pdf work is more unixy than roffish, using commands on a > single line and various flags, and his mspdf.tmac provides new > commands which are wrappers around existing ms commands with added pdf > extensions. So instead of:- > > .NH 1 > Introduction > > It supports:- > > .NH 1 > .XN Introduction > > Apart from having to learn new commands it is also less flexible. Ms > allows:- > > .NH 1 > Introduction to > .I groff > > Which would have to be converted to inline font changes to work with > .XN. Right. I've long been uneasy with this departure from what is idiomatic for ms(7) (and to some extent mm(7)) documents. > It also means that existing ms documents can't magically start using > pdf features without considerable editing. The gold standard > "solution" would be if the output was pdf then pdf features are > automatically included, the same as Branden did with man. It won't surprise you that that's what I think. Getting man page authors to adopt new habits is just shy of futile; where possible, I think the path to success lies in making existing features "just work". Sometimes that's a lot easier said than done; my quest to resolve this problem has forced me to learn a lot about deep formatter internals, including stuff that's never really been documented. It also produced a few struggles between us to make sense of what's really going on in those mysterious internals. > As a proof of concept I attempted to make .NH produce a document > outline as well as headings. I used ms.ms as a test document, That's an ambitious guinea pig, since ms.ms exercises most of the package's own features. On the bright side, if we can tame this beast, I expect most user documents to fall to our sword as well. > using the command > "test-groff -Tpdf -ms -M. -mpdfms -pet -ww ../doc/ms.ms > msdj.pdf", > the result is attached. Looks excellent! The only thing I expected that I didn't get was hotspots in the document's table of contents at the end. However, since there's a lovely navigation bar (in a sufficiently brainful PDF viewer, meaning NOT DROPBOX), that's an inconsequential defect. Support for hotspotted external URLs might be more important. That in turn might mean adding `UR` and `UE` macros to ms(7). As I've noted elsewhere, I am not a fan of "www.tmac"'s approach. > Seems to like pic.ms too. I know from hard experience that that document is good at breaking grohtml; may gropdf fare better. > Now onto the problem with .asciify. Because I have to use a diversion > to capture the line(s) after the .NH the diversion contains nodes, It is necessary to use a diversion? The way the oldest macro packages handled this was by storing the document's input lines into a _macro_. I think this approach _might_ be fruitful because then you're guaranteed not to have to deal with any node objects. However, I haven't tried this and it might have downsides I don't anticipate. > fine when the diversion is output as a heading but needs to be > converted back to text to be used for the bookmark. Traditionally this > was done by calling .asciify which converted glyph nodes and word > space nodes back to text:- > > .{type: glyph_node, character: "F", diversion level: 1}, > {type: glyph_node, character: "i", diversion level: 1}, > {type: glyph_node, character: "r", diversion level: 1}, > {type: glyph_node, character: "s", diversion level: 1}, > {type: glyph_node, character: "t", diversion level: 1}, > {type: word_space_node, diversion level: 1}, > > But, oddly, fails to convert:- > > {type: glyph_node, character: "\u260E", diversion level: 1}, > > Back to \[u260E], possibly the code was written before the advent of > preconv. That sounds like a bug I want to fix. Let me see if it's live in Git today. $ cat ATTIC/unicode-char-in-diversion.groff .box DIV hello \[u260E] world .br .box .DIV .asciify DIV .DIV \X'pdf: ignoreme \*[DIV]' $ ./build/test-groff -bww -T pdf -a ATTIC/unicode-char-in-diversion.groff <beginning of page> troff: backtrace: 'ATTIC/unicode-char-in-diversion.groff':6: string 'DIV' troff: backtrace: file 'ATTIC/unicode-char-in-diversion.groff':8 troff:ATTIC/unicode-char-in-diversion.groff:8: warning: a node is not encodable in device-independent output troff: backtrace: 'ATTIC/unicode-char-in-diversion.groff':6: string 'DIV' troff: backtrace: file 'ATTIC/unicode-char-in-diversion.groff':8 troff:ATTIC/unicode-char-in-diversion.groff:8: warning: missing closing delimiter in device extension escape sequence; expected character "'", got a newline hello <u260E> world hello <u260E> world ' Hmm, yup. Looks like a bug in `asciify` all right. While exploring the issues you've raised here, I've discovered that `chop` is not exactly fully armed and operational. It only "kind of" works, but satisfies the one use case that motivated it; see bug #67453. Solving it properly looks like a heavy lift. GNU troff has yet more bespoke container types for macro/string/diversion contents, `char_list` and `node_list`, and I lack the courage to refactor these away before the 1.24 release. I will not be the least surprised if the issues arising thence are entangled with bug #62264 and bug #64004. > Now, the issues this code uncovered. > > 1. Near the top of the pdfms.tmac there are redefinitions of .B and > .BI which replace the ones which are generated when s.tmac is loaded. > If you comment out the definitions and produce the pdf there are no > errors/warnings but some overview entries are truncated. You can see > this in the two subsidiary entries to "Legacy Features" which both get > truncated to one word by asciify if the s.tmac version of .I is used, > but not if using the redefined version. One difference is that the > s.tmac versions introduce italic correction escapes which may be what > is upsetting asciify. I think you're right. > 2. The shw macro just calls pdfbookmark, so largely redundant, why not > call pdfbookmark directly instead of shw. If you replace shw with a > direct call to pdfbookmark, groff coredumps with an assert failure. I > believe this is the same as a bug number we already have, but the > reproducer started working after removal of asciify from pdf.tmac. So > I have also attached a minimum example (it.trf) which reproduces the > coredump if you use pdfms.tmac with the direct call to pdfbookmark > rather than the indirection though shw. I still need to look into this. > This can all be put on the back burner until after the current release > I'd just appreciate an affirmation that we agree this is the best way > forward in replacing mspdf.tmac. I started on .NH because I think if > we have a workable solution for that I think it is likely all pdf > features can be slotted in without a separate api, which is what Keith > used. > > Any good ideas, or better approaches, would be appreciated. I think you're on a good directional track. We just need to dig some boulders out of the formatter's roadway so your changes can pass easily. Regards, Branden _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?66653> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature