On Tue, Sep 29, 2009 at 12:09:40AM +0200, Bertrand Juglas wrote: > just to let you know about my progress: > i'm trying BeautifulSoup python module to parse the HTML table of > Fedora 11 Updates. i will push my code to bitbucket as bertux user. > Have a nice day, > hoping to be useful ;)
Thanks, cool! I happened to have written some BeautifulSoup code a few years ago, and recently found that BeautifulSoup has become pretty much unmaintained upstream, and some updates made it worse. It has also been quite slow -- you might find lxml etree and XPath a little easier both to write and to read, as well as much faster to run. I've seen massive performance increases from making this change. Here are some example changes from such a conversion: - self.soup = BeautifulSoup(self.data) + self.et = etree.parse(StringIO(self.data), etree.HTMLParser()) - procTable = self.soup.html.find('table', title="Document(s)") + procTable = self.et.xpath("//tab...@title='Document(s)']") - rows = (x for x in procTable('tr') if x('td')) + rows = procTable[0].xpath("tr/td/..") - td = row('td') + td = row.xpath('td') Just a thought -- the conversion isn't hard, and it can be useful. More info on XPath at http://www.w3schools.com/XPath/default.asp Thanks again! _______________________________________________ Foresight-devel mailing list Foresight-devel@lists.rpath.org http://lists.rpath.org/mailman/listinfo/foresight-devel