Package: python-beautifulsoup Version: 3.0.4-1 Hi, An example page causing this bug is http://www.vupp.cz/czvupp/ (URL via dmoz, seems to be a food research institute) it also only happens when I use convertEntities=BeautifulSoup.HTML_ENTITIES
the problem seems to be here: print UnicodeDammit(b,smartQuotesTo=None).unicode prints "None". Which will then be fed to the sgml parser in the _feed method. maybe (!) it could be fixed this way: --- BeautifulSoup.py 2007-04-10 21:39:11.000000000 +0200 +++ /tmp/BeautifulSoup.py 2007-12-29 19:58:15.000000000 +0100 @@ -958,7 +958,7 @@ dammit = UnicodeDammit\ (markup, [self.fromEncoding, inDocumentEncoding], smartQuotesTo=self.smartQuotesTo) - markup = dammit.unicode + markup = dammit.unicode or markup self.originalEncoding = dammit.originalEncoding if markup: if self.markupMassage: but I'm not entirely sure of what dammit is supposed to do. Maybe this is not the proper way of fixing this. It also leaves originalEncoding=None. --- System information. --- Architecture: i386 Kernel: Linux 2.6.23.9 Debian Release: lenny/sid 500 unstable www.debian-multimedia.org 500 unstable ftp.de.debian.org 1 experimental ftp.de.debian.org --- Package information. --- Depends (Version) | Installed =============================-+-=========== python (>= 2.2) | 2.4.4-6 python-support (>= 0.2) | 0.7.5 best regards, Erich Schubert -- erich@(vitavonni.de|debian.org) -- GPG Key ID: 4B3A135C (o_ To understand recursion you first need to understand recursion. //\ Alles verändert sich, sobald man sich selber verändert. V_/_