"Benjamin Zores" wrote:
>> 4. pyxml seems to be faster than before. I need to make a speed
>>    comparison. Even if it is a little bit slower we should use
>>    it. Last time it was way slower, that's why we used libxml2.
>
> That would be good from GeeXboX point of view (one big dependency
> less) 

The dependency could be pyxml and not libxml2. See below

> but from what i could remember, Tack told me libxmlé was
> introduced cause 100x faster than python xml. If it takes 2hours to
> parse xmltv fils with python xml than it might worth fixing libxml2
> instead.

Here some debugging on a 300k TV.xml file:

| from xml.dom.minidom import parse
| dom1 = parse('/home/dmeyer/TV.xml')

0.296 seconds. This seems to be the normal Python XML minidom
parser. Now the one from pyxml:

| from _xmlplus.dom.minidom import parse as parse2
| dom2 = parse2('/home/dmeyer/TV.xml')

0.39 seconds. A bit slower, so the nromal Python parser seems to be
better in this case. But we are only using minidom, next is the real
dom parser:

| from xml.dom.ext.reader import PyExpat
| reader = PyExpat.Reader()
| doc = reader.fromStream(open('/home/dmeyer/TV.xml'))

8.21 seconds. Looks like this is the 100% times faster problem Tack is
talking about. But we don't need the full dom support, in fact maybe
even a simple SAX parser would be ok. Let's try

| from xml.sax import ContentHandler, make_parser
| class docHandler(ContentHandler):
|     def startElement(self, name, attrs):
|         pass
|     def error(self, exception):
|         pass
|     def characters(self, ch):
|         pass
|     def endElement(self, name):
|         pass
| dh = docHandler()
| parser = make_parser()
| parser.setContentHandler(dh)
| t1 = time.time()
| parser.parse(open('/home/dmeyer/TV.xml'))
| t2 = time.time()

0.055 seconds. In same cases this would be enough. And the SAX parser
parses the DTD, too (unlike libxml2 in my last test).

As comparison, this is libxml2 using the dom not the sax parser (so
compare it to the first result):

| import libxml2
| dom = libxml2.parseFile('/home/dmeyer/TV.xml')

0.022 seconds. Even faster as the pyxml2 sax parser. libxml2 is about
10 times faster than minidom. But we are talking about less than one
second here. Tack: can you send me the huge TV.xml file you had the
problem with?

Summary: in my point of view libxml2 brings more problems than
solutions. At least for the normal xml files we have (e.g. cxml config
files) the speed improvement is none compared to the extra dependency.
The only difference could be a very huge TV.xml file. What filesizes
do you have out there?


Dischi

-- 
Any time things appear to be going better, you have overlooked something.

Attachment: pgpYgZzhjCP6Q.pgp
Description: PGP signature

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Freevo-devel mailing list
Freevo-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freevo-devel

Reply via email to