On Tue, 26 Jan 2016 20:52:09 +0100
Dirkjan Ochtman <d...@gentoo.org> wrote:

> All,
> 
> TL;DR: I think we should switch from DTD to RELAX NG (compact syntax,
> ideally) for our XML validation needs. It is more expressive and more
> readable.
> 
> Most people who know anything about XML stuff know that DTDs are not
> that great a solution for validation. Their expression power is very
> limited; there are a few examples of this is in our metadata.dtd [1].
> For a few years now, I've wanted to see if we could replace
> metadata.dtd with something in RELAX NG, which is a more modern XML
> schema language; it's an ISO standard with an emphasis on readability
> both for humans and for tools (by using a rigorous formalism). Some
> arguments in favor of RELAX NG (and some counter-arguments) are
> enumerated on Tim Bray's weblog [2]. I've created a compact syntax
> schema for metadata that can validate all metadata.xml files currently
> in the tree, as an example [3].
> 
> Some arguments against:
> 
> - Not enough tool support for RELAX NG: I'd be curious to hear what
> tools you want to use. At least libxml2 supports RELAX NG natively.
> The Python lxml library uses that support to provide pretty simple
> RELAX NG validation. libxml2 does not have native compact syntax
> support, but I maintain a simple library called rnc2rng [4] that is
> used transparently by lxml if installed. rnc2rng also comes with a
> rnc2rng command-line script to do the conversion.
> 
> - Performance: in a quick test with lxml (backed by libxml2), RELAX NG
> validation takes very similar time compared to DTD. Testing with
> ~19000 metadata.xml files in the tree, with DTD (best of 3):
> 
> real    0m2.861s
> user    0m2.560s
> sys    0m0.296s
> 
> With RNC (best of 3):
> 
> real    0m3.058s
> user    0m2.688s
> sys    0m0.364s
> 
> We could probably easily maintain an XML Schema shadow schema if
> that's really desired, but I would be in favor of making RELAX NG our
> main schema language. I can easily do the work to update repoman for
> this (I've already refactored the metadata code in repoman). What
> other stuff would need to be updated?
> 
> Comments?

Could you post a generated .rng and XML Schema files for comparison?
They don't have to be perfect conversions, just to see how different
they are.

-- 
Best regards,
Michał Górny
<http://dev.gentoo.org/~mgorny/>

Attachment: pgp0qWpv8SyPi.pgp
Description: OpenPGP digital signature

Reply via email to