>From: Ashutosh Goyal <ashutoshgo...@gmail.com>
>To: generateds-users@lists.sourceforge.net
>Sent: Thu, March 18, 2010 1:50:50 AM
>Subject: [Generateds-users] large xml parser
>

> I am looking for most efficient way of parsing a big xml (50MB)
> creating python object.  Current generateDS version is using
> minidom or sax parser.  Is there any faster solution.  What about
> lxml?

Ashutosh -

Yes, generateDS.py uses SAX to parse the XML Schema, *but* the code
that it generates uses minidom to parse the XML instance document.

Specifically, the generated code performs the following steps:

1. Parse the document with minidom.

2. Walk the minidom tree to create instances of the generated
   classes.

If speed is your primary need, you'd probably want to avoid the
second step above.

So, I'd consider the following:

- minidom -- But, don't forget that there is C code (expat) underneath
  minidom.

- ElementTree

- cElementTree -- Faster than plain ElementTree.

- lxml -- The following link gives some speed comparisons:

       http://codespeak.net/lxml/performance.html

I tried parsing a 22 MB file.  Here are the times:

minidom:

    real    0m1.426s
    user    0m1.200s
    sys     0m0.220s

lxml:

    real    0m0.195s
    user    0m0.130s
    sys     0m0.070s

ElementTree:

    real    0m0.446s
    user    0m0.390s
    sys     0m0.060s

cElementTree:

    real    0m0.271s
    user    0m0.230s
    sys     0m0.040s

Of course, times will vary depending on the contents of the
XML document and what you do after parsing it.

Hope this helps.

- Dave


 -- 


Dave Kuhlman
http://www.rexx.com/~dkuhlman

------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

Reply via email to