On 8/30/21 3:44 PM, Gavin Smith wrote:
On Mon, Aug 30, 2021 at 10:00:51AM -0700, Per Bothner wrote:
What I'm looking for is:
(1) Be able to post-process html output with xml tools, such as xslt.
(2) Generate valid epub3 ebooks.
These seem like valid goals so would be happy to see patches that produced
XML output, likely as an option.
What I would like is an option (and maybe even the default) for output that
simultaneously valid both as HTML as XML. AT least "valid as HTML"
in the sense that it will parse by the HTML5 parsing specification
and any modern browser.
This is called "polyglot markup" by the way:
https://www.w3.org/TR/html-polyglot/ (no longer standards-track but still
useful)
https://en.wikibooks.org/wiki/Polyglot_markup,_how_to
The key issues are just these two:
(1) Don't uses named entities except the builtin XML ones.
(2) Close all tags. Where HTML prohibits separate closing tags,
use the XML shorthand, e.g. <hr/> . This works everywhere I've tried it.
You also have to be careful about invalid characters in inline <script> or
<style>
elements - but they should be separate files anyway.
--
--Per Bothner
[email protected] http://per.bothner.com/