Hello community,
here is the log from the commit of package python-beautifulsoup4 for
openSUSE:Factory checked in at 2015-11-17 14:23:36
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-beautifulsoup4 (Old)
and /work/SRC/openSUSE:Factory/.python-beautifulsoup4.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-beautifulsoup4"
Changes:
--------
---
/work/SRC/openSUSE:Factory/python-beautifulsoup4/python-beautifulsoup4.changes
2015-08-10 09:15:53.000000000 +0200
+++
/work/SRC/openSUSE:Factory/.python-beautifulsoup4.new/python-beautifulsoup4.changes
2015-11-17 14:23:37.000000000 +0100
@@ -1,0 +2,15 @@
+Sun Nov 15 16:31:46 UTC 2015 - [email protected]
+
+- Update to version 4.4.1
+ * Fixed a bug that deranged the tree when part of it was
+ removed. Thanks to Eric Weiser for the patch and John Wiseman for a
+ test. lp#1481520
+ * Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
+ Kramer for the patch. lp#1483781
+ * Improved the implementation of CSS selector grouping. Thanks to
+ Orangain for the patch. lp#1484543
+ * Fixed the test_detect_utf8 test so that it works when chardet is
+ installed. lp#1471359
+ * Corrected the output of Declaration objects. lp#1477847
+
+-------------------------------------------------------------------
Old:
----
beautifulsoup4-4.4.0.tar.gz
New:
----
beautifulsoup4-4.4.1.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-beautifulsoup4.spec ++++++
--- /var/tmp/diff_new_pack.PhJNU9/_old 2015-11-17 14:23:37.000000000 +0100
+++ /var/tmp/diff_new_pack.PhJNU9/_new 2015-11-17 14:23:37.000000000 +0100
@@ -17,7 +17,7 @@
Name: python-beautifulsoup4
-Version: 4.4.0
+Version: 4.4.1
Release: 0
Summary: HTML/XML Parser for Quick-Turnaround Applications Like
Screen-Scraping
License: MIT
++++++ beautifulsoup4-4.4.0.tar.gz -> beautifulsoup4-4.4.1.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/NEWS.txt
new/beautifulsoup4-4.4.1/NEWS.txt
--- old/beautifulsoup4-4.4.0/NEWS.txt 2015-07-03 17:14:43.000000000 +0200
+++ new/beautifulsoup4-4.4.1/NEWS.txt 2015-09-29 01:53:36.000000000 +0200
@@ -1,3 +1,21 @@
+= 4.4.1 (20150928) =
+
+* Fixed a bug that deranged the tree when part of it was
+ removed. Thanks to Eric Weiser for the patch and John Wiseman for a
+ test. [bug=1481520]
+
+* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
+ Kramer for the patch. [bug=1483781]
+
+* Improved the implementation of CSS selector grouping. Thanks to
+ Orangain for the patch. [bug=1484543]
+
+* Fixed the test_detect_utf8 test so that it works when chardet is
+ installed. [bug=1471359]
+
+* Corrected the output of Declaration objects. [bug=1477847]
+
+
= 4.4.0 (20150703) =
Especially important changes:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/PKG-INFO
new/beautifulsoup4-4.4.1/PKG-INFO
--- old/beautifulsoup4-4.4.0/PKG-INFO 2015-07-03 17:23:03.000000000 +0200
+++ new/beautifulsoup4-4.4.1/PKG-INFO 2015-09-29 02:19:48.000000000 +0200
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: beautifulsoup4
-Version: 4.4.0
+Version: 4.4.1
Summary: Screen-scraping library
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/beautifulsoup4-4.4.0/beautifulsoup4.egg-info/PKG-INFO
new/beautifulsoup4-4.4.1/beautifulsoup4.egg-info/PKG-INFO
--- old/beautifulsoup4-4.4.0/beautifulsoup4.egg-info/PKG-INFO 2015-07-03
17:23:03.000000000 +0200
+++ new/beautifulsoup4-4.4.1/beautifulsoup4.egg-info/PKG-INFO 2015-09-29
02:19:48.000000000 +0200
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: beautifulsoup4
-Version: 4.4.0
+Version: 4.4.1
Summary: Screen-scraping library
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/__init__.py
new/beautifulsoup4-4.4.1/bs4/__init__.py
--- old/beautifulsoup4-4.4.0/bs4/__init__.py 2015-07-03 14:39:51.000000000
+0200
+++ new/beautifulsoup4-4.4.1/bs4/__init__.py 2015-09-29 02:09:17.000000000
+0200
@@ -17,7 +17,7 @@
"""
__author__ = "Leonard Richardson ([email protected])"
-__version__ = "4.4.0"
+__version__ = "4.4.1"
__copyright__ = "Copyright (c) 2004-2015 Leonard Richardson"
__license__ = "MIT"
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/builder/_html5lib.py
new/beautifulsoup4-4.4.1/bs4/builder/_html5lib.py
--- old/beautifulsoup4-4.4.0/bs4/builder/_html5lib.py 2015-06-28
21:37:27.000000000 +0200
+++ new/beautifulsoup4-4.4.1/bs4/builder/_html5lib.py 2015-09-29
01:48:58.000000000 +0200
@@ -120,7 +120,10 @@
if (name in list_attr['*']
or (self.element.name in list_attr
and name in list_attr[self.element.name])):
- value = whitespace_re.split(value)
+ # A node that is being cloned may have already undergone
+ # this procedure.
+ if not isinstance(value, list):
+ value = whitespace_re.split(value)
self.element[name] = value
def items(self):
return list(self.attrs.items())
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/dammit.py
new/beautifulsoup4-4.4.1/bs4/dammit.py
--- old/beautifulsoup4-4.4.0/bs4/dammit.py 2015-07-03 15:20:06.000000000
+0200
+++ new/beautifulsoup4-4.4.1/bs4/dammit.py 2015-09-29 01:58:41.000000000
+0200
@@ -6,6 +6,7 @@
Feed Parser. It works best on XML and HTML, but it does not rewrite the
XML or HTML to reflect a new encoding; that's the tree builder's job.
"""
+__license__ = "MIT"
from pdb import set_trace
import codecs
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/diagnose.py
new/beautifulsoup4-4.4.1/bs4/diagnose.py
--- old/beautifulsoup4-4.4.0/bs4/diagnose.py 2015-06-27 17:11:31.000000000
+0200
+++ new/beautifulsoup4-4.4.1/bs4/diagnose.py 2015-09-29 01:56:24.000000000
+0200
@@ -1,4 +1,7 @@
"""Diagnostic functions, mainly for use when doing tech support."""
+
+__license__ = "MIT"
+
import cProfile
from StringIO import StringIO
from HTMLParser import HTMLParser
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/element.py
new/beautifulsoup4-4.4.1/bs4/element.py
--- old/beautifulsoup4-4.4.0/bs4/element.py 2015-06-28 20:57:57.000000000
+0200
+++ new/beautifulsoup4-4.4.1/bs4/element.py 2015-09-29 01:56:01.000000000
+0200
@@ -1,3 +1,5 @@
+__license__ = "MIT"
+
from pdb import set_trace
import collections
import re
@@ -262,19 +264,19 @@
next_element = last_child.next_element
if (self.previous_element is not None and
- self.previous_element != next_element):
+ self.previous_element is not next_element):
self.previous_element.next_element = next_element
- if next_element is not None and next_element != self.previous_element:
+ if next_element is not None and next_element is not
self.previous_element:
next_element.previous_element = self.previous_element
self.previous_element = None
last_child.next_element = None
self.parent = None
if (self.previous_sibling is not None
- and self.previous_sibling != self.next_sibling):
+ and self.previous_sibling is not self.next_sibling):
self.previous_sibling.next_sibling = self.next_sibling
if (self.next_sibling is not None
- and self.next_sibling != self.previous_sibling):
+ and self.next_sibling is not self.previous_sibling):
self.next_sibling.previous_sibling = self.previous_sibling
self.previous_sibling = self.next_sibling = None
return self
@@ -287,13 +289,15 @@
last_child = self
while isinstance(last_child, Tag) and last_child.contents:
last_child = last_child.contents[-1]
- if not accept_self and last_child == self:
+ if not accept_self and last_child is self:
last_child = None
return last_child
# BS3: Not part of the API!
_lastRecursiveChild = _last_descendant
def insert(self, position, new_child):
+ if new_child is None:
+ raise ValueError("Cannot insert None into a tag.")
if new_child is self:
raise ValueError("Cannot insert a tag into itself.")
if (isinstance(new_child, basestring)
@@ -750,8 +754,8 @@
class Declaration(PreformattedString):
- PREFIX = u'<!'
- SUFFIX = u'!>'
+ PREFIX = u'<?'
+ SUFFIX = u'?>'
class Doctype(PreformattedString):
@@ -1286,9 +1290,23 @@
def select(self, selector, _candidate_generator=None, limit=None):
"""Perform a CSS selection operation on the current element."""
- # Remove whitespace directly after the grouping operator ','
- # then split into tokens.
- tokens = re.sub(',[\s]*',',', selector).split()
+ # Handle grouping selectors if ',' exists, ie: p,a
+ if ',' in selector:
+ context = []
+ for partial_selector in selector.split(','):
+ partial_selector = partial_selector.strip()
+ if partial_selector == '':
+ raise ValueError('Invalid group selection syntax: %s' %
selector)
+ candidates = self.select(partial_selector, limit=limit)
+ for candidate in candidates:
+ if candidate not in context:
+ context.append(candidate)
+
+ if limit and len(context) >= limit:
+ break
+ return context
+
+ tokens = selector.split()
current_context = [self]
if tokens[-1] in self._selector_combinators:
@@ -1298,198 +1316,192 @@
if self._select_debug:
print 'Running CSS selector "%s"' % selector
- for index, token_group in enumerate(tokens):
+ for index, token in enumerate(tokens):
new_context = []
new_context_ids = set([])
- # Grouping selectors, ie: p,a
- grouped_tokens = token_group.split(',')
- if '' in grouped_tokens:
- raise ValueError('Invalid group selection syntax: %s' %
token_group)
-
if tokens[index-1] in self._selector_combinators:
# This token was consumed by the previous combinator. Skip it.
if self._select_debug:
print ' Token was consumed by the previous combinator.'
continue
- for token in grouped_tokens:
- if self._select_debug:
- print ' Considering token "%s"' % token
- recursive_candidate_generator = None
- tag_name = None
-
- # Each operation corresponds to a checker function, a rule
- # for determining whether a candidate matches the
- # selector. Candidates are generated by the active
- # iterator.
- checker = None
-
- m = self.attribselect_re.match(token)
- if m is not None:
- # Attribute selector
- tag_name, attribute, operator, value = m.groups()
- checker = self._attribute_checker(operator, attribute,
value)
-
- elif '#' in token:
- # ID selector
- tag_name, tag_id = token.split('#', 1)
- def id_matches(tag):
- return tag.get('id', None) == tag_id
- checker = id_matches
-
- elif '.' in token:
- # Class selector
- tag_name, klass = token.split('.', 1)
- classes = set(klass.split('.'))
- def classes_match(candidate):
- return classes.issubset(candidate.get('class', []))
- checker = classes_match
-
- elif ':' in token:
- # Pseudo-class
- tag_name, pseudo = token.split(':', 1)
- if tag_name == '':
- raise ValueError(
- "A pseudo-class must be prefixed with a tag name.")
- pseudo_attributes =
re.match('([a-zA-Z\d-]+)\(([a-zA-Z\d]+)\)', pseudo)
- found = []
- if pseudo_attributes is None:
- pseudo_type = pseudo
- pseudo_value = None
- else:
- pseudo_type, pseudo_value = pseudo_attributes.groups()
- if pseudo_type == 'nth-of-type':
- try:
- pseudo_value = int(pseudo_value)
- except:
- raise NotImplementedError(
- 'Only numeric values are currently supported
for the nth-of-type pseudo-class.')
- if pseudo_value < 1:
- raise ValueError(
- 'nth-of-type pseudo-class value must be at
least 1.')
- class Counter(object):
- def __init__(self, destination):
- self.count = 0
- self.destination = destination
-
- def nth_child_of_type(self, tag):
- self.count += 1
- if self.count == self.destination:
- return True
- if self.count > self.destination:
- # Stop the generator that's sending us
- # these things.
- raise StopIteration()
- return False
- checker = Counter(pseudo_value).nth_child_of_type
- else:
+ if self._select_debug:
+ print ' Considering token "%s"' % token
+ recursive_candidate_generator = None
+ tag_name = None
+
+ # Each operation corresponds to a checker function, a rule
+ # for determining whether a candidate matches the
+ # selector. Candidates are generated by the active
+ # iterator.
+ checker = None
+
+ m = self.attribselect_re.match(token)
+ if m is not None:
+ # Attribute selector
+ tag_name, attribute, operator, value = m.groups()
+ checker = self._attribute_checker(operator, attribute, value)
+
+ elif '#' in token:
+ # ID selector
+ tag_name, tag_id = token.split('#', 1)
+ def id_matches(tag):
+ return tag.get('id', None) == tag_id
+ checker = id_matches
+
+ elif '.' in token:
+ # Class selector
+ tag_name, klass = token.split('.', 1)
+ classes = set(klass.split('.'))
+ def classes_match(candidate):
+ return classes.issubset(candidate.get('class', []))
+ checker = classes_match
+
+ elif ':' in token:
+ # Pseudo-class
+ tag_name, pseudo = token.split(':', 1)
+ if tag_name == '':
+ raise ValueError(
+ "A pseudo-class must be prefixed with a tag name.")
+ pseudo_attributes =
re.match('([a-zA-Z\d-]+)\(([a-zA-Z\d]+)\)', pseudo)
+ found = []
+ if pseudo_attributes is None:
+ pseudo_type = pseudo
+ pseudo_value = None
+ else:
+ pseudo_type, pseudo_value = pseudo_attributes.groups()
+ if pseudo_type == 'nth-of-type':
+ try:
+ pseudo_value = int(pseudo_value)
+ except:
raise NotImplementedError(
- 'Only the following pseudo-classes are
implemented: nth-of-type.')
-
- elif token == '*':
- # Star selector -- matches everything
- pass
- elif token == '>':
- # Run the next token as a CSS selector against the
- # direct children of each tag in the current context.
- recursive_candidate_generator = lambda tag: tag.children
- elif token == '~':
- # Run the next token as a CSS selector against the
- # siblings of each tag in the current context.
- recursive_candidate_generator = lambda tag:
tag.next_siblings
- elif token == '+':
- # For each tag in the current context, run the next
- # token as a CSS selector against the tag's next
- # sibling that's a tag.
- def next_tag_sibling(tag):
- yield tag.find_next_sibling(True)
- recursive_candidate_generator = next_tag_sibling
-
- elif self.tag_name_re.match(token):
- # Just a tag name.
- tag_name = token
+ 'Only numeric values are currently supported for
the nth-of-type pseudo-class.')
+ if pseudo_value < 1:
+ raise ValueError(
+ 'nth-of-type pseudo-class value must be at least
1.')
+ class Counter(object):
+ def __init__(self, destination):
+ self.count = 0
+ self.destination = destination
+
+ def nth_child_of_type(self, tag):
+ self.count += 1
+ if self.count == self.destination:
+ return True
+ if self.count > self.destination:
+ # Stop the generator that's sending us
+ # these things.
+ raise StopIteration()
+ return False
+ checker = Counter(pseudo_value).nth_child_of_type
else:
- raise ValueError(
- 'Unsupported or invalid CSS selector: "%s"' % token)
- if recursive_candidate_generator:
- # This happens when the selector looks like "> foo".
- #
- # The generator calls select() recursively on every
- # member of the current context, passing in a different
- # candidate generator and a different selector.
- #
- # In the case of "> foo", the candidate generator is
- # one that yields a tag's direct children (">"), and
- # the selector is "foo".
- next_token = tokens[index+1]
- def recursive_select(tag):
- if self._select_debug:
- print ' Calling select("%s") recursively on %s
%s' % (next_token, tag.name, tag.attrs)
- print '-' * 40
- for i in tag.select(next_token,
recursive_candidate_generator):
- if self._select_debug:
- print '(Recursive select picked up candidate
%s %s)' % (i.name, i.attrs)
- yield i
- if self._select_debug:
- print '-' * 40
- _use_candidate_generator = recursive_select
- elif _candidate_generator is None:
- # By default, a tag's candidates are all of its
- # children. If tag_name is defined, only yield tags
- # with that name.
+ raise NotImplementedError(
+ 'Only the following pseudo-classes are implemented:
nth-of-type.')
+
+ elif token == '*':
+ # Star selector -- matches everything
+ pass
+ elif token == '>':
+ # Run the next token as a CSS selector against the
+ # direct children of each tag in the current context.
+ recursive_candidate_generator = lambda tag: tag.children
+ elif token == '~':
+ # Run the next token as a CSS selector against the
+ # siblings of each tag in the current context.
+ recursive_candidate_generator = lambda tag: tag.next_siblings
+ elif token == '+':
+ # For each tag in the current context, run the next
+ # token as a CSS selector against the tag's next
+ # sibling that's a tag.
+ def next_tag_sibling(tag):
+ yield tag.find_next_sibling(True)
+ recursive_candidate_generator = next_tag_sibling
+
+ elif self.tag_name_re.match(token):
+ # Just a tag name.
+ tag_name = token
+ else:
+ raise ValueError(
+ 'Unsupported or invalid CSS selector: "%s"' % token)
+ if recursive_candidate_generator:
+ # This happens when the selector looks like "> foo".
+ #
+ # The generator calls select() recursively on every
+ # member of the current context, passing in a different
+ # candidate generator and a different selector.
+ #
+ # In the case of "> foo", the candidate generator is
+ # one that yields a tag's direct children (">"), and
+ # the selector is "foo".
+ next_token = tokens[index+1]
+ def recursive_select(tag):
if self._select_debug:
- if tag_name:
- check = "[any]"
- else:
- check = tag_name
- print ' Default candidate generator, tag name="%s"'
% check
+ print ' Calling select("%s") recursively on %s %s'
% (next_token, tag.name, tag.attrs)
+ print '-' * 40
+ for i in tag.select(next_token,
recursive_candidate_generator):
+ if self._select_debug:
+ print '(Recursive select picked up candidate %s
%s)' % (i.name, i.attrs)
+ yield i
if self._select_debug:
- # This is redundant with later code, but it stops
- # a bunch of bogus tags from cluttering up the
- # debug log.
- def default_candidate_generator(tag):
- for child in tag.descendants:
- if not isinstance(child, Tag):
- continue
- if tag_name and not child.name == tag_name:
- continue
- yield child
- _use_candidate_generator = default_candidate_generator
+ print '-' * 40
+ _use_candidate_generator = recursive_select
+ elif _candidate_generator is None:
+ # By default, a tag's candidates are all of its
+ # children. If tag_name is defined, only yield tags
+ # with that name.
+ if self._select_debug:
+ if tag_name:
+ check = "[any]"
else:
- _use_candidate_generator = lambda tag: tag.descendants
+ check = tag_name
+ print ' Default candidate generator, tag name="%s"' %
check
+ if self._select_debug:
+ # This is redundant with later code, but it stops
+ # a bunch of bogus tags from cluttering up the
+ # debug log.
+ def default_candidate_generator(tag):
+ for child in tag.descendants:
+ if not isinstance(child, Tag):
+ continue
+ if tag_name and not child.name == tag_name:
+ continue
+ yield child
+ _use_candidate_generator = default_candidate_generator
else:
- _use_candidate_generator = _candidate_generator
+ _use_candidate_generator = lambda tag: tag.descendants
+ else:
+ _use_candidate_generator = _candidate_generator
- count = 0
- for tag in current_context:
- if self._select_debug:
- print " Running candidate generator on %s %s" % (
- tag.name, repr(tag.attrs))
- for candidate in _use_candidate_generator(tag):
- if not isinstance(candidate, Tag):
- continue
- if tag_name and candidate.name != tag_name:
- continue
- if checker is not None:
- try:
- result = checker(candidate)
- except StopIteration:
- # The checker has decided we should no longer
- # run the generator.
+ count = 0
+ for tag in current_context:
+ if self._select_debug:
+ print " Running candidate generator on %s %s" % (
+ tag.name, repr(tag.attrs))
+ for candidate in _use_candidate_generator(tag):
+ if not isinstance(candidate, Tag):
+ continue
+ if tag_name and candidate.name != tag_name:
+ continue
+ if checker is not None:
+ try:
+ result = checker(candidate)
+ except StopIteration:
+ # The checker has decided we should no longer
+ # run the generator.
+ break
+ if checker is None or result:
+ if self._select_debug:
+ print " SUCCESS %s %s" % (candidate.name,
repr(candidate.attrs))
+ if id(candidate) not in new_context_ids:
+ # If a tag matches a selector more than once,
+ # don't include it in the context more than once.
+ new_context.append(candidate)
+ new_context_ids.add(id(candidate))
+ if limit and len(new_context) >= limit:
break
- if checker is None or result:
- if self._select_debug:
- print " SUCCESS %s %s" % (candidate.name,
repr(candidate.attrs))
- if id(candidate) not in new_context_ids:
- # If a tag matches a selector more than once,
- # don't include it in the context more than
once.
- new_context.append(candidate)
- new_context_ids.add(id(candidate))
- if limit and len(new_context) >= limit:
- break
- elif self._select_debug:
- print " FAILURE %s %s" % (candidate.name,
repr(candidate.attrs))
+ elif self._select_debug:
+ print " FAILURE %s %s" % (candidate.name,
repr(candidate.attrs))
current_context = new_context
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/testing.py
new/beautifulsoup4-4.4.1/bs4/testing.py
--- old/beautifulsoup4-4.4.0/bs4/testing.py 2015-06-28 21:51:27.000000000
+0200
+++ new/beautifulsoup4-4.4.1/bs4/testing.py 2015-09-29 01:56:34.000000000
+0200
@@ -1,5 +1,7 @@
"""Helper classes for tests."""
+__license__ = "MIT"
+
import pickle
import copy
import functools
@@ -556,6 +558,11 @@
self.assertEqual(
soup.encode(), b'<?xml version="1.0" encoding="utf-8"?>\n<root/>')
+ def test_xml_declaration(self):
+ markup = b"""<?xml version="1.0" encoding="utf8"?>\n<foo/>"""
+ soup = self.soup(markup)
+ self.assertEqual(markup, soup.encode("utf8"))
+
def test_real_xhtml_document(self):
"""A real XHTML document should come out *exactly* the same as it went
in."""
markup = b"""<?xml version="1.0" encoding="utf-8"?>
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/tests/test_html5lib.py
new/beautifulsoup4-4.4.1/bs4/tests/test_html5lib.py
--- old/beautifulsoup4-4.4.0/bs4/tests/test_html5lib.py 2014-12-12
04:21:39.000000000 +0100
+++ new/beautifulsoup4-4.4.1/bs4/tests/test_html5lib.py 2015-09-29
01:51:22.000000000 +0200
@@ -89,3 +89,10 @@
markup = b"""<?PITarget PIContent?>"""
soup = self.soup(markup)
assert str(soup).startswith("<!--?PITarget PIContent?-->")
+
+ def test_cloned_multivalue_node(self):
+ markup = b"""<a class="my_class"><p></a>"""
+ soup = self.soup(markup)
+ a1, a2 = soup.find_all('a')
+ self.assertEqual(a1, a2)
+ assert a1 is not a2
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/tests/test_soup.py
new/beautifulsoup4-4.4.1/bs4/tests/test_soup.py
--- old/beautifulsoup4-4.4.0/bs4/tests/test_soup.py 2015-06-27
15:30:31.000000000 +0200
+++ new/beautifulsoup4-4.4.1/bs4/tests/test_soup.py 2015-07-05
19:19:39.000000000 +0200
@@ -299,10 +299,11 @@
dammit.unicode_markup, """<foo>''""</foo>""")
def test_detect_utf8(self):
- utf8 = b"\xc3\xa9"
+ utf8 = b"Sacr\xc3\xa9 bleu! \xe2\x98\x83"
dammit = UnicodeDammit(utf8)
- self.assertEqual(dammit.unicode_markup, u'\xe9')
self.assertEqual(dammit.original_encoding.lower(), 'utf-8')
+ self.assertEqual(dammit.unicode_markup, u'Sacr\xe9 bleu! \N{SNOWMAN}')
+
def test_convert_hebrew(self):
hebrew = b"\xed\xe5\xec\xf9"
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/bs4/tests/test_tree.py
new/beautifulsoup4-4.4.1/bs4/tests/test_tree.py
--- old/beautifulsoup4-4.4.0/bs4/tests/test_tree.py 2015-06-28
21:50:14.000000000 +0200
+++ new/beautifulsoup4-4.4.1/bs4/tests/test_tree.py 2015-09-29
01:42:21.000000000 +0200
@@ -23,6 +23,7 @@
PY3K,
CData,
Comment,
+ Declaration,
Doctype,
NavigableString,
SoupStrainer,
@@ -1084,6 +1085,31 @@
self.assertEqual(foo_2, soup.a.string)
self.assertEqual(bar_2, soup.b.string)
+ def test_extract_multiples_of_same_tag(self):
+ soup = self.soup("""
+<html>
+<head>
+<script>foo</script>
+</head>
+<body>
+ <script>bar</script>
+ <a></a>
+</body>
+<script>baz</script>
+</html>""")
+ [soup.script.extract() for i in soup.find_all("script")]
+ self.assertEqual("<body>\n\n<a></a>\n</body>", unicode(soup.body))
+
+
+ def
test_extract_works_when_element_is_surrounded_by_identical_strings(self):
+ soup = self.soup(
+ '<html>\n'
+ '<body>hi</body>\n'
+ '</html>')
+ soup.find('body').extract()
+ self.assertEqual(None, soup.find('body'))
+
+
def test_clear(self):
"""Tag.clear()"""
soup = self.soup("<p><a>String <em>Italicized</em></a> and
another</p>")
@@ -1592,6 +1618,9 @@
soup.insert(1, doctype)
self.assertEqual(soup.encode(), b"<!DOCTYPE foo>\n")
+ def test_declaration(self):
+ d = Declaration("foo")
+ self.assertEqual("<?foo?>", d.output_ready())
class TestSoupSelector(TreeTest):
@@ -1942,22 +1971,25 @@
# Test the selector grouping operator (the comma)
def test_multiple_select(self):
- self.assertSelects('x, y',['xid','yid'])
+ self.assertSelects('x, y', ['xid', 'yid'])
def test_multiple_select_with_no_space(self):
- self.assertSelects('x,y',['xid','yid'])
+ self.assertSelects('x,y', ['xid', 'yid'])
def test_multiple_select_with_more_space(self):
- self.assertSelects('x, y',['xid', 'yid'])
+ self.assertSelects('x, y', ['xid', 'yid'])
+
+ def test_multiple_select_duplicated(self):
+ self.assertSelects('x, x', ['xid'])
def test_multiple_select_sibling(self):
- self.assertSelects('x, y ~ p[lang=fr]',['lang-fr'])
+ self.assertSelects('x, y ~ p[lang=fr]', ['xid', 'lang-fr'])
- def test_multiple_select(self):
- self.assertSelects('x, y > z', ['zida', 'zidb', 'zidab', 'zidac'])
+ def test_multiple_select_tag_and_direct_descendant(self):
+ self.assertSelects('x, y > z', ['xid', 'zidb'])
- def test_multiple_select_direct_descendant(self):
- self.assertSelects('div > x, y, z', ['xid', 'yid'])
+ def test_multiple_select_direct_descendant_and_tags(self):
+ self.assertSelects('div > x, y, z', ['xid', 'yid', 'zida', 'zidb',
'zidab', 'zidac'])
def test_multiple_select_indirect_descendant(self):
self.assertSelects('div x,y, z', ['xid', 'yid', 'zida', 'zidb',
'zidab', 'zidac'])
@@ -1966,14 +1998,14 @@
self.assertRaises(ValueError, self.soup.select, ',x, y')
self.assertRaises(ValueError, self.soup.select, 'x,,y')
- def test_multiple_select(self):
- self.assertSelects('p[lang=en],
p[lang=en-gb]',['lang-en','lang-en-gb'])
+ def test_multiple_select_attrs(self):
+ self.assertSelects('p[lang=en], p[lang=en-gb]', ['lang-en',
'lang-en-gb'])
def test_multiple_select_ids(self):
- self.assertSelects('x, y > z[id=zida], z[id=zidab], z[id=zidb]',
['zida', 'zidb','zidab'])
+ self.assertSelects('x, y > z[id=zida], z[id=zidab], z[id=zidb]',
['xid', 'zidb', 'zidab'])
def test_multiple_select_nested(self):
- self.assertSelects('body > div > x, y > z', ['zida', 'zidb', 'zidab',
'zidac'])
+ self.assertSelects('body > div > x, y > z', ['xid', 'zidb'])
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/doc/source/conf.py
new/beautifulsoup4-4.4.1/doc/source/conf.py
--- old/beautifulsoup4-4.4.0/doc/source/conf.py 2013-05-14 14:20:54.000000000
+0200
+++ new/beautifulsoup4-4.4.1/doc/source/conf.py 2015-07-03 17:31:12.000000000
+0200
@@ -41,7 +41,7 @@
# General information about the project.
project = u'Beautiful Soup'
-copyright = u'2012, Leonard Richardson'
+copyright = u'2004-2015, Leonard Richardson'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
@@ -50,7 +50,7 @@
# The short X.Y version.
version = '4'
# The full version, including alpha/beta/rc tags.
-release = '4.2.0'
+release = '4.4.0'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/doc/source/index.rst
new/beautifulsoup4-4.4.1/doc/source/index.rst
--- old/beautifulsoup4-4.4.0/doc/source/index.rst 2015-06-28
21:33:53.000000000 +0200
+++ new/beautifulsoup4-4.4.1/doc/source/index.rst 2015-09-29
00:46:53.000000000 +0200
@@ -29,7 +29,7 @@
This documentation has been translated into other languages by
Beautiful Soup users:
-* `这篇文档当然还有中文版.
<http://www.crummy.com/software/BeautifulSoup/bs4/doc/index.cn.html>`_
+* `这篇文档当然还有中文版. <http://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/>`_
* このページは日本語で利用できます(`外部リンク <http://kondou.com/BS4/>`_)
* 이 문서는 한국어 번역도 가능합니다. (`외부 링크
<http://coreapython.hosting.paran.com/etc/beautifulsoup4.html>`_)
@@ -1130,7 +1130,7 @@
If you pass in a function to filter on a specific attribute like
``href``, the argument passed into the function will be the attribute
value, not the whole tag. Here's a function that finds all ``a`` tags
-whose ``href`` attribute _does not_ match a regular expression::
+whose ``href`` attribute *does not* match a regular expression::
def not_lacie(href):
return href and not re.compile("lacie").search(href)
@@ -1359,6 +1359,12 @@
soup.find_all("a", string="Elsie")
# [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]
+The ``string`` argument is new in Beautiful Soup 4.4.0. In earlier
+versions it was called ``text``::
+
+ soup.find_all("a", text="Elsie")
+ # [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]
+
.. _limit:
The ``limit`` argument
@@ -3120,11 +3126,11 @@
their values, not strings. This may affect the way you search by CSS
class.
-If you pass one of the ``find*`` methods both :ref:`text <text>` `and`
+If you pass one of the ``find*`` methods both :ref:`string <string>` `and`
a tag-specific argument like :ref:`name <name>`, Beautiful Soup will
search for tags that match your tag-specific criteria and whose
-:ref:`Tag.string <.string>` matches your value for :ref:`text
-<text>`. It will `not` find the strings themselves. Previously,
+:ref:`Tag.string <.string>` matches your value for :ref:`string
+<string>`. It will `not` find the strings themselves. Previously,
Beautiful Soup ignored the tag-specific arguments and looked for
strings.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/beautifulsoup4-4.4.0/setup.py
new/beautifulsoup4-4.4.1/setup.py
--- old/beautifulsoup4-4.4.0/setup.py 2015-07-03 17:18:16.000000000 +0200
+++ new/beautifulsoup4-4.4.1/setup.py 2015-09-29 02:11:15.000000000 +0200
@@ -5,7 +5,7 @@
setup(
name="beautifulsoup4",
- version = "4.4.0",
+ version = "4.4.1",
author="Leonard Richardson",
author_email='[email protected]',
url="http://www.crummy.com/software/BeautifulSoup/bs4/",