On Sat, Jul 05, 2014 at 20:08:02 +0000, David Holland wrote: > Another possibility is to write a PDF driver for the groff we have. > This is still work, but possibly not that much (e.g. maybe one could > rewrite the upstream gplv3 Perl groff driver in Lua...); on the other > hand the artefact it produces is of limited long-term value. And it's > not like groff is doing a super job of typesetting the articles... > and there's long been a desire to kick groff out of base.
I took a look at what needs to be done for this and hacked together a quick prototype (with perl PDF::Haru binding of libharu) that handles text only (no D groff_out(5) commands) - it's about 200 lines of sparse code (though not much error handling or fully blown groff_out(5) parser that can handle arbitrary output, not just what groff produces). I haven't touched PS/PDF in a while, but I hoped it could be simple, and unfortunately it's not (exaggerating a bit, you can generate PS from groff intermediate output with sed - I was hoping for this level of "simple"). PDF is much more restrictive than PS and there are some obstacles in mapping groff_out(5) to PDF. As far as I can tell PDF driver must have access to font metrics, which seem gross, as groff has already done all the layout. One obstacle is that c/C commands do not change the current position. Since showing text does change position, that change needs to be undone in the generated PDF. Unfortunately gsave/grestore is not available for this in PDF, so you need to know the width of the character to emit the move backwards. [In my protoype I used a totally gross hack of printing the same char backwards in invisible rendering mode, which gets me the right position without knowing the character width, but doesn't play nice with some programs that extract text from PDF.] Another obstacle is that PDF text matrix is a separate matrix that is gone when text object ends. This makes it impossible, it seems, to mix text and graphics without tracking current position, so, again, you need to know character widths. Both of these are not unsolvable (or even hard), but feel icky. -uwe