There is a new version of generateDS.py -- v. 2.0a.

The most significant change is that the generated code uses
lxml/ElementTree instead of minidom.  The generated code
automatically uses lxml if available; if not, cElementTree if
available; if not ElementTree if available ...  See the executed
code.

Another reasonably significant change is that now the parse
functions (parse, parseString, and parseLiteral) in the generated code
automatically recognizes the root element and uses the appropriate
(generated) class to build the instance.

More notes from the README are below.

I bumped the version number to 2.x because the 1.x series was
getting a bit old and because zero in 2.0 should be a small warning
that this version has significant code changes, so "use at your own
risk", "buyer beware", and "don't give up your day job".  More
seriously, it would be a good idea to also hang onto the previous
version until your are sure that this one does what you want. 

One difference between the old and new versions is that lxml and
ElementTree do not preserve CDATA sections.  Instead, they escape
special characters with XML entity characters.  I believe that is
what we will most often want.  But, if your application depends on
CDATA sections, you will need to make some adjustments.

Lxml, but not ElementTree, has an option to preserve CDATA
sections.  But, (1) it's best that we not use capabilities that are
supported by lxml but not ElementTree and (2) I haven't figured out
how to use lxml'sCDATA objects anyway.  See here for more on this:
http://codespeak.net/lxml/api.html#cdata

Biswanath Patel and Jaime Huerta Cepas promoted the switch to
lxml/ElementTree.  Thanks to you both for the motivation.

Here are the notes from the README:

    Version 2.0a (6/21/2010)
      * Switched to use of lxml/ElementTree in generated files. 
        Thanks to Biswanath Patel and Jaime Huerta Cepas for
        encouraging me to implement the switch to lxml/ElementTree.
      * Modified the generation of functions parse(), parseString(),
        and parseLiteral() so that they automatically recognize the
        root element of an instance XML document and call the build
        method of the appropriate class.
      * Fix to hasContent_ method so that so that in elements defined
        with extension-base, the superclass is checked also.
      * For classes that must call an overridden method m in the
        superclass, switched to use "super(superclassname, self).m(...)"
        instead of "m.(self, ...)".
      * Known issues -- (1) generateDS.py loops and crashes with
        "RuntimeError: maximum recursion depth exceeded" on some
        schemas (for example collada_schema_1_4.xsd).  (2) Failure in
        process_includes.py with import of remote file and nested
        imports (for example collada_schema_1_5.xsd).

By the way, one of the reasons for switching to lxml/ElementTree is
the hope for increased speed on large documents.  So, if anyone
does a timing comparison, please let me know about the results.

Here are some results on a 3.5 MB input XML instance document.  I
commented out the lines that do the export so that the test does
little more than parse and build.  tmp3sup.py uses minidom;
tmp19sup.py uses lxml:

    $ time python tmp3sup.py big1.xml

    real    0m8.482s
    user    0m7.990s
    sys     0m0.320s
    $ time python tmp19sup.py big1.xml

    real    0m1.244s
    user    0m1.080s
    sys     0m0.130s

So, for large documents, the speed up is significant.  For an even
larger document (38 MB), the minidom version made my machine slow to a
crawl, and I eventually had to kill it.  The lxml version took this
long (on the 38 MB doc):

    $ time python tmp19sup.py big.xml

    real    0m13.645s
    user    0m12.590s
    sys     0m0.710s

- Dave


 -- 


Dave Kuhlman
http://www.rexx.com/~dkuhlman


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

Reply via email to