Package: python-beautifulsoup
Version: 3.1.0.1-1
Severity: important
The recent upgrade from 3.0.7 to 3.1.0 caused BeautifulSoup to stop
being able to parse HTML pages that contain particular forms of
embedded JavaScript.
Here is a small example that parses correctly with 3.0.7.
<html>
<head>
<title>Not-So-Beautiful Soup</title>
</head>
<body>
<script>
function legalJS() {
var str = '</p>';
return 0<str.length;
}
</script>
</body>
</html>
With 3.1.0, it causes this failure:
File "./souptest.py", line 7, in <module>
soup = BeautifulSoup(page)
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1499, in
__init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1230, in
__init__
self._feed(isHTML=isHTML)
File "/var/lib/python-support/python2.5/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.5/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.5/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.5/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.5/HTMLParser.py", line 301, in
check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.5/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 9, column 28
-- System Information:
Debian Release: 5.0
APT prefers testing
APT policy: (990, 'testing'), (500, 'stable'), (400, 'unstable'), (1,
'experimental')
Architecture: i386 (i686)
Kernel: Linux 2.6.26-1-686 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages python-beautifulsoup depends on:
ii python 2.5.2-3 An interactive high-level object-o
ii python-support 0.8.7 automated rebuilding support for P
python-beautifulsoup recommends no packages.
python-beautifulsoup suggests no packages.
-- no debconf information
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]