On Tue, Sep 29, 2009 at 12:09:40AM +0200, Bertrand Juglas wrote:
> just to let you know about my progress:
> i'm trying BeautifulSoup python module to parse the HTML table of
> Fedora 11 Updates. i will push my code to bitbucket as bertux user.
> Have a nice day,
> hoping to be useful ;)

Thanks, cool!

I happened to have written some BeautifulSoup code a few years
ago, and recently found that BeautifulSoup has become pretty much
unmaintained upstream, and some updates made it worse.  It has also
been quite slow -- you might find lxml etree and XPath a little
easier both to write and to read, as well as much faster to run.
I've seen massive performance increases from making this change.
Here are some example changes from such a conversion:

-        self.soup = BeautifulSoup(self.data)
+        self.et = etree.parse(StringIO(self.data), etree.HTMLParser())

-        procTable = self.soup.html.find('table', title="Document(s)")
+        procTable = self.et.xpath("//tab...@title='Document(s)']")

-        rows = (x for x in procTable('tr') if x('td'))
+        rows = procTable[0].xpath("tr/td/..")

-            td = row('td')
+            td = row.xpath('td')

Just a thought -- the conversion isn't hard, and it can be useful.

More info on XPath at http://www.w3schools.com/XPath/default.asp

Thanks again!
_______________________________________________
Foresight-devel mailing list
Foresight-devel@lists.rpath.org
http://lists.rpath.org/mailman/listinfo/foresight-devel

Reply via email to