Hello community,
here is the log from the commit of package python-beautifulsoup4 for
openSUSE:Factory checked in at 2013-06-29 19:43:22
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-beautifulsoup4 (Old)
and /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-beautifulsoup4"
Changes:
--------
---
/work/SRC/openSUSE:Factory/python-beautifulsoup4/python-beautifulsoup4.changes
2013-06-18 10:36:16.000000000 +0200
+++
/work/SRC/openSUSE:Factory/.python-beautifulsoup4.new/python-beautifulsoup4.changes
2013-06-29 22:25:55.000000000 +0200
@@ -1,0 +2,37 @@
+Thu Jun 27 13:32:06 UTC 2013 - [email protected]
+
+- Update upstream URL
+
+-------------------------------------------------------------------
+Tue Jun 25 11:52:34 UTC 2013 - [email protected]
+
+- update to 4.2.1:
+ * The default XML formatter will now replace ampersands even if they
+ appear to be part of entities. That is, "<" will become
+ "&lt;". The old code was left over from Beautiful Soup 3, which
+ didn't always turn entities into Unicode characters.
+
+ If you really want the old behavior (maybe because you add new
+ strings to the tree, those strings include entities, and you want
+ the formatter to leave them alone on output), it can be found in
+ EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
+
+ * Gave new_string() the ability to create subclasses of
+ NavigableString. [bug=1181986]
+
+ * Fixed another bug by which the html5lib tree builder could create a
+ disconnected tree. [bug=1182089]
+
+ * The .previous_element of a BeautifulSoup object is now always None,
+ not the last element to be parsed. [bug=1182089]
+
+ * Fixed test failures when lxml is not installed. [bug=1181589]
+
+ * html5lib now supports Python 3. Fixed some Python 2-specific
+ code in the html5lib test suite. [bug=1181624]
+
+ * The html.parser treebuilder can now handle numeric attributes in
+ text when the hexidecimal name of the attribute starts with a
+ capital X. Patch by Tim Shirley. [bug=1186242]
+
+-------------------------------------------------------------------
Old:
----
beautifulsoup4-4.2.0.tar.gz
New:
----
beautifulsoup4-4.2.1.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-beautifulsoup4.spec ++++++
--- /var/tmp/diff_new_pack.luDQsR/_old 2013-06-29 22:25:56.000000000 +0200
+++ /var/tmp/diff_new_pack.luDQsR/_new 2013-06-29 22:25:56.000000000 +0200
@@ -16,9 +16,8 @@
#
-%define _name beautifulsoup4
-Name: python-%{_name}
-Version: 4.2.0
+Name: python-beautifulsoup4
+Version: 4.2.1
Release: 0
Summary: HTML/XML Parser for Quick-Turnaround Applications Like
Screen-Scraping
License: MIT
@@ -26,20 +25,19 @@
Url: http://www.crummy.com/software/BeautifulSoup/
Source:
http://pypi.python.org/packages/source/b/beautifulsoup4/beautifulsoup4-%{version}.tar.gz
BuildRoot: %{_tmppath}/%{name}-%{version}-build
-BuildRequires: python-Sphinx
BuildRequires: python-devel >= 2.6
+# Documentation requirements:
+BuildRequires: python-Sphinx
+# Test requirements:
BuildRequires: python-html5lib
BuildRequires: python-lxml
BuildRequires: python-nose
Requires: python-html5lib
Requires: python-lxml
-%{py_requires}
-
-# build fails for SLE11 64bit due to 'noarch'
-%if 0%{?suse_version} >= 1140
-BuildArch: noarch
-%else
+%if 0%{?suse_version} && 0%{?suse_version} <= 1110
%{!?python_sitelib: %global python_sitelib %(python -c "from
distutils.sysconfig import get_python_lib; print get_python_lib()")}
+%else
+BuildArch: noarch
%endif
%description
@@ -79,32 +77,26 @@
%prep
-%setup -q -n %{_name}-%{version}
+%setup -q -n beautifulsoup4-%{version}
%build
-CFLAGS="%{optflags}" python setup.py build
+python setup.py build
+cd doc && make html
%install
-python setup.py install \
- --prefix=%{_prefix} \
- --root=%{buildroot}
-cd doc
-make html
+python setup.py install --prefix=%{_prefix} --root=%{buildroot}
-%if 0%{?suse_version} >= 1140
%check
nosetests
-%endif
%files
%defattr(-,root,root)
%doc AUTHORS.txt COPYING.txt
%{python_sitelib}/bs4/
-%{python_sitelib}/%{_name}-%{version}-py*.egg-info
+%{python_sitelib}/beautifulsoup4-%{version}-py*.egg-info
%files doc
%defattr(-,root,root)
-%doc NEWS.txt README.txt TODO.txt
-%doc doc/build/html
+%doc NEWS.txt README.txt TODO.txt doc/build/html
%changelog
++++++ beautifulsoup4-4.2.0.tar.gz -> beautifulsoup4-4.2.1.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/NEWS.txt
new/beautifulsoup4-4.2.1/NEWS.txt
--- old/beautifulsoup4-4.2.0/NEWS.txt 2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/NEWS.txt 2013-05-31 15:49:44.000000000 +0200
@@ -1,3 +1,33 @@
+= 4.2.1 (20130531) =
+
+* The default XML formatter will now replace ampersands even if they
+ appear to be part of entities. That is, "<" will become
+ "&lt;". The old code was left over from Beautiful Soup 3, which
+ didn't always turn entities into Unicode characters.
+
+ If you really want the old behavior (maybe because you add new
+ strings to the tree, those strings include entities, and you want
+ the formatter to leave them alone on output), it can be found in
+ EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
+
+* Gave new_string() the ability to create subclasses of
+ NavigableString. [bug=1181986]
+
+* Fixed another bug by which the html5lib tree builder could create a
+ disconnected tree. [bug=1182089]
+
+* The .previous_element of a BeautifulSoup object is now always None,
+ not the last element to be parsed. [bug=1182089]
+
+* Fixed test failures when lxml is not installed. [bug=1181589]
+
+* html5lib now supports Python 3. Fixed some Python 2-specific
+ code in the html5lib test suite. [bug=1181624]
+
+* The html.parser treebuilder can now handle numeric attributes in
+ text when the hexidecimal name of the attribute starts with a
+ capital X. Patch by Tim Shirley. [bug=1186242]
+
= 4.2.0 (20130514) =
* The Tag.select() method now supports a much wider variety of CSS
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/PKG-INFO
new/beautifulsoup4-4.2.1/PKG-INFO
--- old/beautifulsoup4-4.2.0/PKG-INFO 2013-05-15 14:43:52.000000000 +0200
+++ new/beautifulsoup4-4.2.1/PKG-INFO 2013-05-31 15:54:14.000000000 +0200
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: beautifulsoup4
-Version: 4.2.0
+Version: 4.2.1
Summary: UNKNOWN
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/__init__.py
new/beautifulsoup4-4.2.1/bs4/__init__.py
--- old/beautifulsoup4-4.2.0/bs4/__init__.py 2013-05-15 14:36:50.000000000
+0200
+++ new/beautifulsoup4-4.2.1/bs4/__init__.py 2013-05-31 15:42:38.000000000
+0200
@@ -17,7 +17,7 @@
"""
__author__ = "Leonard Richardson ([email protected])"
-__version__ = "4.2.0"
+__version__ = "4.2.1"
__copyright__ = "Copyright (c) 2004-2013 Leonard Richardson"
__license__ = "MIT"
@@ -201,9 +201,9 @@
"""Create a new tag associated with this soup."""
return Tag(None, self.builder, name, namespace, nsprefix, attrs)
- def new_string(self, s):
+ def new_string(self, s, subclass=NavigableString):
"""Create a new NavigableString associated with this soup."""
- navigable = NavigableString(s)
+ navigable = subclass(s)
navigable.setup()
return navigable
@@ -245,14 +245,14 @@
o = containerClass(currentData)
self.object_was_parsed(o)
- def object_was_parsed(self, o, parent=None, previous_element=None):
+ def object_was_parsed(self, o, parent=None, most_recent_element=None):
"""Add an object to the parse tree."""
parent = parent or self.currentTag
- previous_element = previous_element or self.previous_element
- o.setup(parent, previous_element)
- if self.previous_element:
- self.previous_element.next_element = o
- self.previous_element = o
+ most_recent_element = most_recent_element or self._most_recent_element
+ o.setup(parent, most_recent_element)
+ if most_recent_element is not None:
+ most_recent_element.next_element = o
+ self._most_recent_element = o
parent.contents.append(o)
def _popToTag(self, name, nsprefix=None, inclusivePop=True):
@@ -297,12 +297,12 @@
return None
tag = Tag(self, self.builder, name, namespace, nsprefix, attrs,
- self.currentTag, self.previous_element)
+ self.currentTag, self._most_recent_element)
if tag is None:
return tag
- if self.previous_element:
- self.previous_element.next_element = tag
- self.previous_element = tag
+ if self._most_recent_element:
+ self._most_recent_element.next_element = tag
+ self._most_recent_element = tag
self.pushTag(tag)
return tag
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/__init__.py
new/beautifulsoup4-4.2.1/bs4/builder/__init__.py
--- old/beautifulsoup4-4.2.0/bs4/builder/__init__.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/__init__.py 2013-05-20
20:58:23.000000000 +0200
@@ -152,7 +152,7 @@
tag_specific = self.cdata_list_attributes.get(
tag_name.lower(), [])
for cdata_list_attr in itertools.chain(universal, tag_specific):
- if cdata_list_attr in dict(attrs):
+ if cdata_list_attr in attrs:
# Basically, we have a "class" attribute whose
# value is a whitespace-separated list of CSS
# classes. Split it into a list.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_html5lib.py
new/beautifulsoup4-4.2.1/bs4/builder/_html5lib.py
--- old/beautifulsoup4-4.2.0/bs4/builder/_html5lib.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/_html5lib.py 2013-05-20
18:01:18.000000000 +0200
@@ -131,6 +131,7 @@
old_element = self.element.contents[-1]
new_element = self.soup.new_string(old_element + node.element)
old_element.replace_with(new_element)
+ self.soup._most_recent_element = new_element
else:
self.soup.object_was_parsed(node.element, parent=self.element)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_htmlparser.py
new/beautifulsoup4-4.2.1/bs4/builder/_htmlparser.py
--- old/beautifulsoup4-4.2.0/bs4/builder/_htmlparser.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/_htmlparser.py 2013-05-31
15:48:27.000000000 +0200
@@ -58,6 +58,8 @@
# it's fixed.
if name.startswith('x'):
real_name = int(name.lstrip('x'), 16)
+ elif name.startswith('X'):
+ real_name = int(name.lstrip('X'), 16)
else:
real_name = int(name)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/builder/_lxml.py
new/beautifulsoup4-4.2.1/bs4/builder/_lxml.py
--- old/beautifulsoup4-4.2.0/bs4/builder/_lxml.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/builder/_lxml.py 2013-05-20
15:09:43.000000000 +0200
@@ -3,6 +3,7 @@
'LXMLTreeBuilder',
]
+from io import BytesIO
from StringIO import StringIO
import collections
from lxml import etree
@@ -75,7 +76,9 @@
dammit.contains_replacement_characters)
def feed(self, markup):
- if isinstance(markup, basestring):
+ if isinstance(markup, bytes):
+ markup = BytesIO(markup)
+ elif isinstance(markup, unicode):
markup = StringIO(markup)
# Call feed() at least once, even if the markup is empty,
# or the parser won't be initialized.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/dammit.py
new/beautifulsoup4-4.2.1/bs4/dammit.py
--- old/beautifulsoup4-4.2.0/bs4/dammit.py 2013-05-15 14:36:50.000000000
+0200
+++ new/beautifulsoup4-4.2.1/bs4/dammit.py 2013-05-20 20:58:23.000000000
+0200
@@ -81,6 +81,8 @@
"&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)"
")")
+ AMPERSAND_OR_BRACKET = re.compile("([<>&])")
+
@classmethod
def _substitute_html_entity(cls, matchobj):
entity = cls.CHARACTER_TO_HTML_ENTITY.get(matchobj.group(0))
@@ -134,6 +136,28 @@
def substitute_xml(cls, value, make_quoted_attribute=False):
"""Substitute XML entities for special XML characters.
+ :param value: A string to be substituted. The less-than sign
+ will become <, the greater-than sign will become >,
+ and any ampersands will become &. If you want ampersands
+ that appear to be part of an entity definition to be left
+ alone, use substitute_xml_containing_entities() instead.
+
+ :param make_quoted_attribute: If True, then the string will be
+ quoted, as befits an attribute value.
+ """
+ # Escape angle brackets and ampersands.
+ value = cls.AMPERSAND_OR_BRACKET.sub(
+ cls._substitute_xml_entity, value)
+
+ if make_quoted_attribute:
+ value = cls.quoted_attribute_value(value)
+ return value
+
+ @classmethod
+ def substitute_xml_containing_entities(
+ cls, value, make_quoted_attribute=False):
+ """Substitute XML entities for special XML characters.
+
:param value: A string to be substituted. The less-than sign will
become <, the greater-than sign will become >, and any
ampersands that are not part of an entity defition will
@@ -151,6 +175,7 @@
value = cls.quoted_attribute_value(value)
return value
+
@classmethod
def substitute_html(cls, s):
"""Replace certain Unicode characters with named HTML entities.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/diagnose.py
new/beautifulsoup4-4.2.1/bs4/diagnose.py
--- old/beautifulsoup4-4.2.0/bs4/diagnose.py 2013-05-15 14:36:50.000000000
+0200
+++ new/beautifulsoup4-4.2.1/bs4/diagnose.py 2013-05-20 17:07:53.000000000
+0200
@@ -4,8 +4,11 @@
from bs4 import BeautifulSoup, __version__
from bs4.builder import builder_registry
import os
+import random
+import time
import traceback
import sys
+import cProfile
def diagnose(data):
"""Diagnostic suite for isolating common problems."""
@@ -70,32 +73,36 @@
class AnnouncingParser(HTMLParser):
"""Announces HTMLParser parse events, without doing anything else."""
+
+ def _p(self, s):
+ print(s)
+
def handle_starttag(self, name, attrs):
- print "%s START" % name
+ self._p("%s START" % name)
def handle_endtag(self, name):
- print "%s END" % name
+ self._p("%s END" % name)
def handle_data(self, data):
- print "%s DATA" % data
+ self._p("%s DATA" % data)
def handle_charref(self, name):
- print "%s CHARREF" % name
+ self._p("%s CHARREF" % name)
def handle_entityref(self, name):
- print "%s ENTITYREF" % name
+ self._p("%s ENTITYREF" % name)
def handle_comment(self, data):
- print "%s COMMENT" % data
+ self._p("%s COMMENT" % data)
def handle_decl(self, data):
- print "%s DECL" % data
+ self._p("%s DECL" % data)
def unknown_decl(self, data):
- print "%s UNKNOWN-DECL" % data
+ self._p("%s UNKNOWN-DECL" % data)
def handle_pi(self, data):
- print "%s PI" % data
+ self._p("%s PI" % data)
def htmlparser_trace(data):
"""Print out the HTMLParser events that occur during parsing.
@@ -106,5 +113,66 @@
parser = AnnouncingParser()
parser.feed(data)
+_vowels = "aeiou"
+_consonants = "bcdfghjklmnpqrstvwxyz"
+
+def rword(length=5):
+ "Generate a random word-like string."
+ s = ''
+ for i in range(length):
+ if i % 2 == 0:
+ t = _consonants
+ else:
+ t = _vowels
+ s += random.choice(t)
+ return s
+
+def rsentence(length=4):
+ "Generate a random sentence-like string."
+ return " ".join(rword(random.randint(4,9)) for i in range(length))
+
+def rdoc(num_elements=1000):
+ """Randomly generate an invalid HTML document."""
+ tag_names = ['p', 'div', 'span', 'i', 'b', 'script', 'table']
+ elements = []
+ for i in range(num_elements):
+ choice = random.randint(0,3)
+ if choice == 0:
+ # New tag.
+ tag_name = random.choice(tag_names)
+ elements.append("<%s>" % tag_name)
+ elif choice == 1:
+ elements.append(rsentence(random.randint(1,4)))
+ elif choice == 2:
+ # Close a tag.
+ tag_name = random.choice(tag_names)
+ elements.append("</%s>" % tag_name)
+ return "<html>" + "\n".join(elements) + "</html>"
+
+def benchmark_parsers(num_elements=100000):
+ """Very basic head-to-head performance benchmark."""
+ print "Comparative parser benchmark on Beautiful Soup %s" % __version__
+ data = rdoc(num_elements)
+ print "Generated a large invalid HTML document (%d bytes)." % len(data)
+
+ for parser in ["lxml", ["lxml", "html"], "html5lib", "html.parser"]:
+ success = False
+ try:
+ a = time.time()
+ soup = BeautifulSoup(data, parser)
+ b = time.time()
+ success = True
+ except Exception, e:
+ print "%s could not parse the markup." % parser
+ traceback.print_exc()
+ if success:
+ print "BS4+%s parsed the markup in %.2fs." % (parser, b-a)
+
+ from lxml import etree
+ a = time.time()
+ etree.HTML(data)
+ b = time.time()
+ print "Raw lxml parsed the markup in %.2fs." % (b-a)
+
if __name__ == '__main__':
diagnose(sys.stdin.read())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/testing.py
new/beautifulsoup4-4.2.1/bs4/testing.py
--- old/beautifulsoup4-4.2.0/bs4/testing.py 2013-05-15 14:36:50.000000000
+0200
+++ new/beautifulsoup4-4.2.1/bs4/testing.py 2013-05-31 15:46:18.000000000
+0200
@@ -228,12 +228,14 @@
expect = u'<p id="pi\N{LATIN SMALL LETTER N WITH TILDE}ata"></p>'
self.assertSoupEquals('<p id="piñata"></p>', expect)
self.assertSoupEquals('<p id="piñata"></p>', expect)
+ self.assertSoupEquals('<p id="piñata"></p>', expect)
self.assertSoupEquals('<p id="piñata"></p>', expect)
def test_entities_in_text_converted_to_unicode(self):
expect = u'<p>pi\N{LATIN SMALL LETTER N WITH TILDE}ata</p>'
self.assertSoupEquals("<p>piñata</p>", expect)
self.assertSoupEquals("<p>piñata</p>", expect)
+ self.assertSoupEquals("<p>piñata</p>", expect)
self.assertSoupEquals("<p>piñata</p>", expect)
def test_quot_entity_converted_to_quotation_mark(self):
@@ -246,6 +248,12 @@
self.assertSoupEquals("�", expect)
self.assertSoupEquals("�", expect)
+ def test_multipart_strings(self):
+ "Mostly to prevent a recurrence of a bug in the html5lib treebuilder."
+ soup = self.soup("<html><h2>\nfoo</h2><p></p></html>")
+ self.assertEqual("p", soup.h2.string.next_element.name)
+ self.assertEqual("p", soup.p.name)
+
def test_basic_namespaces(self):
"""Parsers don't need to *understand* namespaces, but at the
very least they should not choke on namespaces or lose
@@ -464,6 +472,18 @@
self.assertEqual(
soup.encode("utf-8"), markup)
+ def test_formatter_processes_script_tag_for_xml_documents(self):
+ doc = """
+ <script type="text/javascript">
+ </script>
+"""
+ soup = BeautifulSoup(doc, "xml")
+ # lxml would have stripped this while parsing, but we can add
+ # it later.
+ soup.script.string = 'console.log("< < hey > > ");'
+ encoded = soup.encode()
+ self.assertTrue(b"< < hey > >" in encoded)
+
def test_popping_namespaced_tag(self):
markup = '<rss
xmlns:dc="foo"><dc:creator>b</dc:creator><dc:date>2012-07-02T20:33:42Z</dc:date><dc:rights>c</dc:rights><image>d</image></rss>'
soup = self.soup(markup)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_html5lib.py
new/beautifulsoup4-4.2.1/bs4/tests/test_html5lib.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_html5lib.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_html5lib.py 2013-05-20
15:33:11.000000000 +0200
@@ -69,4 +69,4 @@
</html>'''
soup = self.soup(markup)
# Verify that we can reach the <p> tag; this means the tree is
connected.
- self.assertEquals("<p>foo</p>", soup.p.encode())
+ self.assertEqual(b"<p>foo</p>", soup.p.encode())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_lxml.py
new/beautifulsoup4-4.2.1/bs4/tests/test_lxml.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_lxml.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_lxml.py 2013-05-20
15:14:40.000000000 +0200
@@ -10,6 +10,7 @@
LXML_VERSION = lxml.etree.LXML_VERSION
except ImportError, e:
LXML_PRESENT = False
+ LXML_VERSION = (0,)
from bs4 import (
BeautifulSoup,
@@ -47,7 +48,7 @@
# test if an old version of lxml is installed.
@skipIf(
- LXML_VERSION < (2,3,5,0),
+ not LXML_PRESENT or LXML_VERSION < (2,3,5,0),
"Skipping doctype test for old version of lxml to avoid segfault.")
def test_empty_doctype(self):
soup = self.soup("<!DOCTYPE>")
@@ -85,4 +86,3 @@
@property
def default_builder(self):
return LXMLTreeBuilderForXML()
-
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_soup.py
new/beautifulsoup4-4.2.1/bs4/tests/test_soup.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_soup.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_soup.py 2013-05-20
20:58:23.000000000 +0200
@@ -125,9 +125,14 @@
def test_xml_quoting_handles_ampersands(self):
self.assertEqual(self.sub.substitute_xml("AT&T"), "AT&T")
- def
test_xml_quoting_ignores_ampersands_when_they_are_part_of_an_entity(self):
+ def
test_xml_quoting_including_ampersands_when_they_are_part_of_an_entity(self):
self.assertEqual(
self.sub.substitute_xml("ÁT&T"),
+ "&Aacute;T&T")
+
+ def
test_xml_quoting_ignoring_ampersands_when_they_are_part_of_an_entity(self):
+ self.assertEqual(
+ self.sub.substitute_xml_containing_entities("ÁT&T"),
"ÁT&T")
def test_quotes_not_html_substituted(self):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/bs4/tests/test_tree.py
new/beautifulsoup4-4.2.1/bs4/tests/test_tree.py
--- old/beautifulsoup4-4.2.0/bs4/tests/test_tree.py 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/bs4/tests/test_tree.py 2013-05-31
15:43:04.000000000 +0200
@@ -689,6 +689,12 @@
self.assertEqual("foo", s)
self.assertTrue(isinstance(s, NavigableString))
+ def test_new_string_can_create_navigablestring_subclass(self):
+ soup = self.soup("")
+ s = soup.new_string("foo", Comment)
+ self.assertEqual("foo", s)
+ self.assertTrue(isinstance(s, Comment))
+
class TestTreeModification(SoupTest):
def test_attribute_modification(self):
@@ -1181,7 +1187,6 @@
soup = self.soup("foo<!--IGNORE-->bar")
self.assertEqual(['foo', 'bar'], list(soup.strings))
-
class TestCDAtaListAttributes(SoupTest):
"""Testing cdata-list attributes like 'class'.
@@ -1344,18 +1349,6 @@
encoded = BeautifulSoup(doc).encode()
self.assertTrue(b"< < hey > >" in encoded)
- def test_formatter_processes_script_tag_for_xml_documents(self):
- doc = """
- <script type="text/javascript">
- </script>
-"""
- soup = BeautifulSoup(doc, "xml")
- # lxml would have stripped this while parsing, but we can add
- # it later.
- soup.script.string = 'console.log("< < hey > > ");'
- encoded = soup.encode()
- self.assertTrue(b"< < hey > >" in encoded)
-
def test_prettify_leaves_preformatted_text_alone(self):
soup = self.soup("<div> foo <pre> \tbar\n \n </pre> baz ")
# Everything outside the <pre> tag is reformatted, but everything
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/doc/source/index.rst
new/beautifulsoup4-4.2.1/doc/source/index.rst
--- old/beautifulsoup4-4.2.0/doc/source/index.rst 2013-05-15
14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/doc/source/index.rst 2013-05-20
16:18:05.000000000 +0200
@@ -239,10 +239,10 @@
:kbd:`$ pip install lxml`
-If you're using Python 2, another alternative is the pure-Python
-`html5lib parser <http://code.google.com/p/html5lib/>`_, which parses
-HTML the way a web browser does. Depending on your setup, you might
-install html5lib with one of these commands:
+Another alternative is the pure-Python `html5lib parser
+<http://code.google.com/p/html5lib/>`_, which parses HTML the way a
+web browser does. Depending on your setup, you might install html5lib
+with one of these commands:
:kbd:`$ apt-get install python-html5lib`
@@ -270,7 +270,7 @@
| html5lib | ``BeautifulSoup(markup, "html5lib")`` | *
Extremely lenient | * Very slow |
| | | * Parses
pages the same way a | * External Python |
| | | web
browser does | dependency |
-| | | *
Creates valid HTML5 | * Python 2 only |
+| | | *
Creates valid HTML5 | |
+----------------------+--------------------------------------------+--------------------------------+--------------------------+
If you can, I recommend you install and use lxml for speed. If you're
@@ -1806,6 +1806,20 @@
tag.contents
# [u'Hello', u' there']
+If you want to create a comment or some other subclass of
+``NavigableString``, pass that class as the second argument to
+``new_string()``::
+
+ from bs4 import Comment
+ new_comment = soup.new_string("Nice to see you.", Comment)
+ tag.append(new_comment)
+ tag
+ # <b>Hello there<!--Nice to see you.--></b>
+ tag.contents
+ # [u'Hello', u' there', u'Nice to see you.']
+
+(This is a new feature in Beautiful Soup 4.2.1.)
+
What if you need to create a whole new tag? The best solution is to
call the factory method ``BeautifulSoup.new_tag()``::
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.2.0/setup.py
new/beautifulsoup4-4.2.1/setup.py
--- old/beautifulsoup4-4.2.0/setup.py 2013-05-15 14:36:50.000000000 +0200
+++ new/beautifulsoup4-4.2.1/setup.py 2013-05-31 15:52:01.000000000 +0200
@@ -7,7 +7,7 @@
from distutils.command.build_py import build_py
setup(name="beautifulsoup4",
- version = "4.2.0",
+ version = "4.2.1",
author="Leonard Richardson",
author_email='[email protected]',
url="http://www.crummy.com/software/BeautifulSoup/bs4/",
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]