On Tue, 26 Jan 2016 20:52:09 +0100 Dirkjan Ochtman <d...@gentoo.org> wrote:
> All, > > TL;DR: I think we should switch from DTD to RELAX NG (compact syntax, > ideally) for our XML validation needs. It is more expressive and more > readable. > > Most people who know anything about XML stuff know that DTDs are not > that great a solution for validation. Their expression power is very > limited; there are a few examples of this is in our metadata.dtd [1]. > For a few years now, I've wanted to see if we could replace > metadata.dtd with something in RELAX NG, which is a more modern XML > schema language; it's an ISO standard with an emphasis on readability > both for humans and for tools (by using a rigorous formalism). Some > arguments in favor of RELAX NG (and some counter-arguments) are > enumerated on Tim Bray's weblog [2]. I've created a compact syntax > schema for metadata that can validate all metadata.xml files currently > in the tree, as an example [3]. > > Some arguments against: > > - Not enough tool support for RELAX NG: I'd be curious to hear what > tools you want to use. At least libxml2 supports RELAX NG natively. > The Python lxml library uses that support to provide pretty simple > RELAX NG validation. libxml2 does not have native compact syntax > support, but I maintain a simple library called rnc2rng [4] that is > used transparently by lxml if installed. rnc2rng also comes with a > rnc2rng command-line script to do the conversion. > > - Performance: in a quick test with lxml (backed by libxml2), RELAX NG > validation takes very similar time compared to DTD. Testing with > ~19000 metadata.xml files in the tree, with DTD (best of 3): > > real 0m2.861s > user 0m2.560s > sys 0m0.296s > > With RNC (best of 3): > > real 0m3.058s > user 0m2.688s > sys 0m0.364s > > We could probably easily maintain an XML Schema shadow schema if > that's really desired, but I would be in favor of making RELAX NG our > main schema language. I can easily do the work to update repoman for > this (I've already refactored the metadata code in repoman). What > other stuff would need to be updated? > > Comments? Could you post a generated .rng and XML Schema files for comparison? They don't have to be perfect conversions, just to see how different they are. -- Best regards, Michał Górny <http://dev.gentoo.org/~mgorny/>
pgp0qWpv8SyPi.pgp
Description: OpenPGP digital signature