On 05/05/2015 11:41, "Mario =?UTF-8?B?S3LDtnBsaW4i?= <linkr...@github.com>" wrote:

Recently, I compared DOM parsers for an XML files of 100 MByte:

15.8 s tango.text.xml (SiegeLord/Tango-D2)
13.4 s ae.utils.xml (CyberShadow/ae)
  8.5 s xml.etree (Python)

Either the Tango DOM parser is slow compared to the Tango pull parser,
or the D2 port ruined the performance.


fwiw I did some tests a couple of years back with https://launchpad.net/d2-xml on 20 odd megabyte files and found it faster than Tango. Unfortunately that would need some work to test now, as xmlp is abandoned and wouldn't build last time I tried it :-(

I also had some success with https://github.com/opticron/kxml, though it had some issues with chuffy entity decoding performance.


Also, profiling showed a lot of time spent in the GC, and the recent improvements in that area might have changed things by now.

Reply via email to