Our own stylesheets are therefore divided into at least phases. First,
[…]
As far as I can see, the XSL 3 stylesheets for XslTNG are also similar
in structure.
Yep. The xslTNG stylesheets go through several standard stages:
1. Normalize the logical structure (get rid of entity refs, basically)
2. Expand XIncludes
3. Upgrade from 4 to 5 if the input isn’t in a namespace
4. Process transclusions
5. Normalize the markup
6. Process annotations
7. Process external link bases
Plus a couple more that are conditional.
So there is a point in these stylesheets where the input document is
in a sort of "canonical DocBook". However, this canonical format is
not documented.
That’s true.
My suggestion is that the DocBook TC standardize and document the
canonical DocBook format. Subsequently, stylesheets for transforming
The problem with a documented canonical format is that, like a “minimal
subset”, you could probably get broad agreement on 80% of it, but no two
people would have the same 80% in mind.
Another problem is that no one wants to author in the canonical format.
It’s the format that removes all markup minimization.
I could spin off the normalizing stylesheets, steps 1 to 5 above,
optional 6 and 7, into a separate package. And I suppose, that could be
documented. I don’t know if that’s a TC activity or not though as it’s
pretty application specific.
para/simpara: canonical DocBook should only support simpara. para with
block-content (tables, lists) must be transformed into a sequence of
simpara and other block-content.
That’s in your 80% is it :-).
Tables: In canonical DocBook, each table must have table column
specifications. Default values are replaced by explicit values.
[…]
which column it starts and where it ends without complex calculations.
Content of table cell must be element only.
It sounds like what you really want here, isn’t even CALS (or HTML)
tables. You want the completely explicit internal format that the xslTNG
stylesheets generate during table processing. They turn the entire table
into a perfectly rectangular grid, using “ghost” elements for cells that
are missing.
That’s kind of true for a few of the other ideas you proposed, like the
inline markup.
After a while, this starts to feel less like a canonical DocBook and
more like a structural interchange format.
Images: Each image must have at least the attributes for image size
and scaling.
Getting those, if the author didn’t provide them, requires extensions
and is even then only speculative. I’m sure there are image formats I
can’t parse. Author’s really should provide them.
P. S. This text was translated withwww.DeepL.com/Translator (free
version) from german language.
Wow. It did a remarkably good job. I would not, on a casual reading,
have suspected autotranslation.
Be seeing you,
norm
--
Norman Tovey-Walsh<[email protected]>
https://nwalsh.com/
Before you criticize someone, walk a mile in his shoes. That way, when
you criticize him, you're a mile away and you have his shoes.