>From: Ashutosh Goyal <ashutoshgo...@gmail.com> >To: generateds-users@lists.sourceforge.net >Sent: Thu, March 18, 2010 1:50:50 AM >Subject: [Generateds-users] large xml parser >
> I am looking for most efficient way of parsing a big xml (50MB) > creating python object. Current generateDS version is using > minidom or sax parser. Is there any faster solution. What about > lxml? Ashutosh - Yes, generateDS.py uses SAX to parse the XML Schema, *but* the code that it generates uses minidom to parse the XML instance document. Specifically, the generated code performs the following steps: 1. Parse the document with minidom. 2. Walk the minidom tree to create instances of the generated classes. If speed is your primary need, you'd probably want to avoid the second step above. So, I'd consider the following: - minidom -- But, don't forget that there is C code (expat) underneath minidom. - ElementTree - cElementTree -- Faster than plain ElementTree. - lxml -- The following link gives some speed comparisons: http://codespeak.net/lxml/performance.html I tried parsing a 22 MB file. Here are the times: minidom: real 0m1.426s user 0m1.200s sys 0m0.220s lxml: real 0m0.195s user 0m0.130s sys 0m0.070s ElementTree: real 0m0.446s user 0m0.390s sys 0m0.060s cElementTree: real 0m0.271s user 0m0.230s sys 0m0.040s Of course, times will vary depending on the contents of the XML document and what you do after parsing it. Hope this helps. - Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ generateds-users mailing list generateds-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/generateds-users