There is a new version of generateDS.py -- v. 2.0a. The most significant change is that the generated code uses lxml/ElementTree instead of minidom. The generated code automatically uses lxml if available; if not, cElementTree if available; if not ElementTree if available ... See the executed code.
Another reasonably significant change is that now the parse functions (parse, parseString, and parseLiteral) in the generated code automatically recognizes the root element and uses the appropriate (generated) class to build the instance. More notes from the README are below. I bumped the version number to 2.x because the 1.x series was getting a bit old and because zero in 2.0 should be a small warning that this version has significant code changes, so "use at your own risk", "buyer beware", and "don't give up your day job". More seriously, it would be a good idea to also hang onto the previous version until your are sure that this one does what you want. One difference between the old and new versions is that lxml and ElementTree do not preserve CDATA sections. Instead, they escape special characters with XML entity characters. I believe that is what we will most often want. But, if your application depends on CDATA sections, you will need to make some adjustments. Lxml, but not ElementTree, has an option to preserve CDATA sections. But, (1) it's best that we not use capabilities that are supported by lxml but not ElementTree and (2) I haven't figured out how to use lxml'sCDATA objects anyway. See here for more on this: http://codespeak.net/lxml/api.html#cdata Biswanath Patel and Jaime Huerta Cepas promoted the switch to lxml/ElementTree. Thanks to you both for the motivation. Here are the notes from the README: Version 2.0a (6/21/2010) * Switched to use of lxml/ElementTree in generated files. Thanks to Biswanath Patel and Jaime Huerta Cepas for encouraging me to implement the switch to lxml/ElementTree. * Modified the generation of functions parse(), parseString(), and parseLiteral() so that they automatically recognize the root element of an instance XML document and call the build method of the appropriate class. * Fix to hasContent_ method so that so that in elements defined with extension-base, the superclass is checked also. * For classes that must call an overridden method m in the superclass, switched to use "super(superclassname, self).m(...)" instead of "m.(self, ...)". * Known issues -- (1) generateDS.py loops and crashes with "RuntimeError: maximum recursion depth exceeded" on some schemas (for example collada_schema_1_4.xsd). (2) Failure in process_includes.py with import of remote file and nested imports (for example collada_schema_1_5.xsd). By the way, one of the reasons for switching to lxml/ElementTree is the hope for increased speed on large documents. So, if anyone does a timing comparison, please let me know about the results. Here are some results on a 3.5 MB input XML instance document. I commented out the lines that do the export so that the test does little more than parse and build. tmp3sup.py uses minidom; tmp19sup.py uses lxml: $ time python tmp3sup.py big1.xml real 0m8.482s user 0m7.990s sys 0m0.320s $ time python tmp19sup.py big1.xml real 0m1.244s user 0m1.080s sys 0m0.130s So, for large documents, the speed up is significant. For an even larger document (38 MB), the minidom version made my machine slow to a crawl, and I eventually had to kill it. The lxml version took this long (on the 38 MB doc): $ time python tmp19sup.py big.xml real 0m13.645s user 0m12.590s sys 0m0.710s - Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ generateds-users mailing list generateds-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/generateds-users