>From: Ashutosh Goyal <[email protected]>
>To: [email protected]
>Sent: Thu, March 18, 2010 1:50:50 AM
>Subject: [Generateds-users] large xml parser
>
> I am looking for most efficient way of parsing a big xml (50MB)
> creating python object. Current generateDS version is using
> minidom or sax parser. Is there any faster solution. What about
> lxml?
Ashutosh -
Yes, generateDS.py uses SAX to parse the XML Schema, *but* the code
that it generates uses minidom to parse the XML instance document.
Specifically, the generated code performs the following steps:
1. Parse the document with minidom.
2. Walk the minidom tree to create instances of the generated
classes.
If speed is your primary need, you'd probably want to avoid the
second step above.
So, I'd consider the following:
- minidom -- But, don't forget that there is C code (expat) underneath
minidom.
- ElementTree
- cElementTree -- Faster than plain ElementTree.
- lxml -- The following link gives some speed comparisons:
http://codespeak.net/lxml/performance.html
I tried parsing a 22 MB file. Here are the times:
minidom:
real 0m1.426s
user 0m1.200s
sys 0m0.220s
lxml:
real 0m0.195s
user 0m0.130s
sys 0m0.070s
ElementTree:
real 0m0.446s
user 0m0.390s
sys 0m0.060s
cElementTree:
real 0m0.271s
user 0m0.230s
sys 0m0.040s
Of course, times will vary depending on the contents of the
XML document and what you do after parsing it.
Hope this helps.
- Dave
--
Dave Kuhlman
http://www.rexx.com/~dkuhlman
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
generateds-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/generateds-users