First, leave performance expectations at the door. The ambitious experiment I describe below is intended to provide airtight handling for a conversion medium which is inherently lossy (Roff -> HTML/SVG/CSS/et al, Markdown, and Markdown with GitHub-flavoured options).
*1. Handling semantics* We all know you can't draw semantics from cold, low-level formatting commands. But for certain contexts - hierarchically sorted documents, consistently indented code-samples and tables marked as tables, I believe (okay, *hoping)* it's possible to reconstruct meaning from... well, stuff that looks like this: n12000 0 V84000 H72000 x X devtag:.NH 1 x font 36 TB f36s10950V84000H72000 How? See the x X devtag line? That's what inspired this whole landslide of absurd ambition. I wondered what we could do if more metadata were provided that way – as device-specific control strings from, say, a preprocessor. I intend to have a complementary preprocessor (probably named infer) perform preliminary scans in the document pipeline to unintrusively tag regions of particular interest. "Particular interest" here refers mainly to preprocessors like tbl, eqn and pic which generate output that's mangled beyond recognition. It also refers to tracking any macro packages like mdoc(7) which *may* carry semantic meaning with their command-set. Bear in mind these are really just hints it's dropping for the post-processor phase: it certainly doesn't attempt to go any further than recognising unparsed requests and macro calls. It's not trying to be a genius. It's just annotating context for more reliable interpretation. Now, about that... *2. We're gonna abuse metrics as a cloudy way to predict what the reader is supposed to see* We know the widths and heights of each mounted device-font, their kerning-pairs, ligatures, and lord knows what else. We milk this for all it's worth: by plotting each glyph's bounding box in a scaled space representing the output medium, we identify the most obvious constructs first. This is actually where it becomes impossible to continue explaining without illustrations or diagrams, and the whole process I'm envisioning is very indirect, and influenced by numerous assumptions about output. Now, this might turn out and be fantastic. Or it might be a flop. One way or the other, I'm gonna have a hell of a lot of fun seeing how far I can get and whether it's possible.