Re: [Boston.pm] Xpath query acceleration

Ricker, William Wed, 16 Jun 2010 17:14:20 -0700

> That wouldn't work, as the document contains items that aren't in the
schema,


I recall that was the whole point of your walking it yourself rather
than using a yes-no external oracular validator, yes. 

To truly validate to a schema, you must recursively parse the file under
control of the annotative grammar which is the schema. I don't expect
querying the schema by xpath to work in general case, but your schema
may be such you know it will.

If it's as easy as determining that this tag isn't even in our lexicon,
or our tag of this name doesn't have this attribute, our tag of this
name is only allowed with certain parents that do not include the
current document's usage, you should be able to mark #FAIL as you go,
whichever is driving. You could preprocess the schema into a hash of
rules to validate, possibly using Xpath as semantics for magic string
values.

If the schema is sufficiently abstract that you need backtracking search
to determine what bits of the document XML have to be excluded for the
rest to validate, you have a horror on your hands.  



> so you'd have to keep track of which nodes you validated, 
> and then still query or walk the document looking for things 
> that weren't covered by the schema.

you can delete them when they fail validation, which is what I thought
you said you'd do, or keep a list of refs to what fails.

> I haven't tried using XML::Twig::XPath, but it deviates from the DOM
> API, which can make it harder to port your code to a different XML
library. 

Yes, it's a just-barely-sufficient-magic hack, not 100% solution. XML
was supposed to be less overly-general than SGML, but it's still too
too, too.

bill

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Xpath query acceleration

Reply via email to