Re: [groff] groff as the basis for comprehensive documentation?

John Gardner Fri, 20 Apr 2018 17:25:57 -0700

First, leave performance expectations at the door. The ambitious experiment
I describe below is intended to provide airtight handling for a conversion
medium which is inherently lossy (Roff -> HTML/SVG/CSS/et al, Markdown, and
Markdown with GitHub-flavoured options).



*1. Handling semantics*
We all know you can't draw semantics from cold, low-level formatting
commands. But for certain contexts - hierarchically sorted documents,
consistently indented code-samples and tables marked as tables, I believe
(okay, *hoping)* it's possible to reconstruct meaning from... well, stuff
that looks like this:

n12000 0 V84000 H72000
x X devtag:.NH 1
x font 36 TB
f36s10950V84000H72000


How? See the x X devtag line? That's what inspired this whole landslide of
absurd ambition. I wondered what we could do if more metadata were provided
that way – as device-specific control strings from, say, a preprocessor.

I intend to have a complementary preprocessor (probably named infer)
perform preliminary scans in the document pipeline to unintrusively tag
regions of particular interest. "Particular interest" here refers mainly to
preprocessors like tbl, eqn and pic which generate output that's mangled
beyond recognition.

It also refers to tracking any macro packages like mdoc(7) which *may* carry
semantic meaning with their command-set. Bear in mind these are really just
hints it's dropping for the post-processor phase: it certainly doesn't
attempt to go any further than recognising unparsed requests and macro
calls. It's not trying to be a genius. It's just annotating context for
more reliable interpretation.

Now, about that...

*2. We're gonna abuse metrics as a cloudy way to predict what the reader is
supposed to see*
We know the widths and heights of each mounted device-font, their
kerning-pairs, ligatures, and lord knows what else. We milk this for all
it's worth: by plotting each glyph's bounding box in a scaled space
representing the output medium, we identify the most obvious constructs
first.

This is actually where it becomes impossible to continue explaining without
illustrations or diagrams, and the whole process I'm envisioning is very
indirect, and influenced by numerous assumptions about output.

Now, this might turn out and be fantastic. Or it might be a flop. One way
or the other, I'm gonna have a hell of a lot of fun seeing how far I can
get and whether it's possible.

Re: [groff] groff as the basis for comprehensive documentation?

Reply via email to