[Boston.pm] Xpath query acceleration

Tom Metro Wed, 16 Jun 2010 10:55:07 -0700

I have some code that uses XML::LibXML to walk the nodes of an XML
document, then uses the findnodes() method to do Xpath queries against
an WSD schema file to validate each element/attribute/enumerated value.


(While there are XML::LibXML methods and external tools to validate XML
against a schema, I needed something that provided granular feedback, as
the objective is to filter out items that don't comply with the schema.)

Works fine, except it seems to be really slow, and I suspect it is the
repeated Xpath queries into the schema.

I thought of using Memoize, but it would be challenging, because the
code does stuff like:
   $type_def->findnodes( q{.//xs:attribu...@name='} . $att_name . q{']}
)->shift

where $type_def is a previously looked up sub-branch of the schema, and
the Xpath is searching relative to that point. Code using references
doesn't lend itself to memoization.

The other obvious approach is to forget Xpath and instead pre-process
the schema into a Perl data structure.

Before I do either, I'll profile the code to see if my suspicion of the
Xpath queries is correct. (Though I'm pretty confident that's the issue.
I wouldn't expect calls to $node->attributes or $node->childNodes to be
slow.)

(Unlike some Perl XML libraries, if I recall correctly XML::LibXML
always calls into a native C library, and doesn't have a pure-Perl
fallback, so I don't think I'm seeing a speed hit due to that happening.
However, the version of XML::LibXML I'm using is a 2007-era, so that
could be a problem.)

 -Tom

-- 
Tom Metro
Venture Logic, Newton, MA, USA
"Enterprise solutions through open source."
Professional Profile: http://tmetro.venturelogic.com/

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] Xpath query acceleration

Reply via email to