"Benjamin Zores" wrote: >> 4. pyxml seems to be faster than before. I need to make a speed >> comparison. Even if it is a little bit slower we should use >> it. Last time it was way slower, that's why we used libxml2. > > That would be good from GeeXboX point of view (one big dependency > less)
The dependency could be pyxml and not libxml2. See below > but from what i could remember, Tack told me libxmlé was > introduced cause 100x faster than python xml. If it takes 2hours to > parse xmltv fils with python xml than it might worth fixing libxml2 > instead. Here some debugging on a 300k TV.xml file: | from xml.dom.minidom import parse | dom1 = parse('/home/dmeyer/TV.xml') 0.296 seconds. This seems to be the normal Python XML minidom parser. Now the one from pyxml: | from _xmlplus.dom.minidom import parse as parse2 | dom2 = parse2('/home/dmeyer/TV.xml') 0.39 seconds. A bit slower, so the nromal Python parser seems to be better in this case. But we are only using minidom, next is the real dom parser: | from xml.dom.ext.reader import PyExpat | reader = PyExpat.Reader() | doc = reader.fromStream(open('/home/dmeyer/TV.xml')) 8.21 seconds. Looks like this is the 100% times faster problem Tack is talking about. But we don't need the full dom support, in fact maybe even a simple SAX parser would be ok. Let's try | from xml.sax import ContentHandler, make_parser | class docHandler(ContentHandler): | def startElement(self, name, attrs): | pass | def error(self, exception): | pass | def characters(self, ch): | pass | def endElement(self, name): | pass | dh = docHandler() | parser = make_parser() | parser.setContentHandler(dh) | t1 = time.time() | parser.parse(open('/home/dmeyer/TV.xml')) | t2 = time.time() 0.055 seconds. In same cases this would be enough. And the SAX parser parses the DTD, too (unlike libxml2 in my last test). As comparison, this is libxml2 using the dom not the sax parser (so compare it to the first result): | import libxml2 | dom = libxml2.parseFile('/home/dmeyer/TV.xml') 0.022 seconds. Even faster as the pyxml2 sax parser. libxml2 is about 10 times faster than minidom. But we are talking about less than one second here. Tack: can you send me the huge TV.xml file you had the problem with? Summary: in my point of view libxml2 brings more problems than solutions. At least for the normal xml files we have (e.g. cxml config files) the speed improvement is none compared to the extra dependency. The only difference could be a very huge TV.xml file. What filesizes do you have out there? Dischi -- Any time things appear to be going better, you have overlooked something.
pgpYgZzhjCP6Q.pgp
Description: PGP signature
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________ Freevo-devel mailing list Freevo-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freevo-devel