I was recently trying to use generateDS with lxml. generateDS by default
parses the xml using SAX parser and creates a minidom Node object. It then
traverses the node starting from root, to build the actual required object
according to a schema. Now, the first part (parsing the xml string) can be
easily converted to lxml, which returns an lxml etree Node object. However,
I encountered some problems traversing this object with the generateDS code.
What I find is that, though the algorithm used is generic and can be used to
traverse any kind of node, the code itself is deeply tied to the minidom
node. For example, functions like "getChildren()", attributes like
"nodeValue" and "nodeType" and node types like "ELEMENT_NODE" or "TEXT_NODE"
have been used, which are specific to minidom but are not found in other
node elements - like in the node returned by lxml parsing.

The core functionality of the generateDS module should be separated form the
type of node being operated on - so that the module becomes node-agnostic -
and the same generateDS functions can be integrated with any parsing module
- lxml, SAX, or anything else. This is especially important since lxml
provides significant improvements in parsing performance (I noticed
speed-ups of almost 100 times) compared to minidom, especially for large
xmls of over 30-40 MBs.
------------------------------------------------------------------------------

_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

Reply via email to