Package: planet-venus
Version: 0~git9de2109-4
Severity: important
Tags: patch

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Dear Maintainer,

after updating python-html5lib to 0.999999999-1, planet-venus fails
with:

ERROR:planet.runner:TypeError: __init__() got an unexpected keyword argument 
'encoding'
ERROR:planet.runner:  File "/usr/lib/python2.7/dist-packages/planet/spider.py", 
line 484, in spiderPlanet
    writeCache(uri, feed_info, data)
ERROR:planet.runner:  File "/usr/lib/python2.7/dist-packages/planet/spider.py", 
line 293, in writeCache
    reconstitute.source(xdoc.documentElement,data.feed,data.bozo,format)
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/planet/reconstitute.py", line 240, in source
    content(xsource, 'subtitle', source.get('subtitle_detail',None), bozo)
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/planet/reconstitute.py", line 170, in content
    html = parser.parse(xdiv % detail.value, encoding="utf-8")
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 235, in parse
    self._parse(stream, False, None, *args, **kwargs)
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 85, in _parse
    self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/_tokenizer.py", line 36, in __init__
    self.stream = HTMLInputStream(stream, **kwargs)
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/_inputstream.py", line 151, in 
HTMLInputStream
    return HTMLBinaryInputStream(source, **kwargs)
Traceback (most recent call last):
  File "/usr/bin/planet", line 143, in <module>
    doc = splice.splice()
  File "/usr/lib/python2.7/dist-packages/planet/splice.py", line 84, in splice
    reconstitute.source(xdoc.documentElement, data.feed, None, None)
  File "/usr/lib/python2.7/dist-packages/planet/reconstitute.py", line 240, in 
source
    content(xsource, 'subtitle', source.get('subtitle_detail',None), bozo)
  File "/usr/lib/python2.7/dist-packages/planet/reconstitute.py", line 170, in 
content
    html = parser.parse(xdiv % detail.value, encoding="utf-8")
  File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 235, in 
parse
    self._parse(stream, False, None, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 85, in 
_parse
    self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
  File "/usr/lib/python2.7/dist-packages/html5lib/_tokenizer.py", line 36, in 
__init__
    self.stream = HTMLInputStream(stream, **kwargs)
  File "/usr/lib/python2.7/dist-packages/html5lib/_inputstream.py", line 151, 
in HTMLInputStream
    return HTMLBinaryInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'encoding'

Fixing this results in another error regarding the sanitizer. See [1] and [2].

The attached patch makes planet-venus work again. It should probably be
incorporated into debian/patches/html5lib-no_XHTMLSerializer.patch.

Cheers,
sur5r

[1] https://github.com/html5lib/html5lib-python/issues/277
[2] https://github.com/html5lib/html5lib-python/issues/72



-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEe/X2rDZDH11A3BN6TPKyGPVNrj0FAlg65/MACgkQTPKyGPVN
rj1l6BAAqQyCb4TzzZ5ueiBhp5OTY7U5z+8SP4rquuD+4bMaSq6sZuDkwH/mk71E
+rXt5/EsUezRoIjvmRpOlP/1ANDNnidhoxz7OttHBiRWZQUZ/QG6HlSF4t3BOOUY
J87zTwMJJC0aM2CRod5K30EUX2eDnmbrEyMJ5DqL2aSl+V8I7tH+9ttTK7myeW25
C0y8S2D3GWCn3pjMh3PsKk6zEkX+3niERpXfXNHytlrYuBEJI4hG9xi6g7sHN9ds
dhaiopTbUonEQhHkpzKwmPc08IcMvwO/xTCecrtsiTGs1wRi5I7uxmRwySljVzDS
AuIm3cEz/Qy8SzDkDc7eWYrk7LxYE2vcJ4PZlNy75sSWoDsq0LYbmcHQq7vtrHhd
dlctzLSEx9v0MUtNcjz6iCCdFBnVdJS3VTLjCqmlt4p1c0LgbeZeuokmIhIb3s/Q
kClegb1wcuqcw3PKxMjZdUWEg7/gh84aDf/d2kb2+r+B54XXhysQM9eXpTPm24Hx
ushQZ99At/mxFEbY1UmlvUmMjfNdEV402riDUlKUGR7f+10dWvxY2cRRSZc+fXGj
cmAeT8xZa8aAZ2ou9Qmq/8/ixK9ez+A0VFgKBV69wqPzQx2fG3Omy3AY+/encjGp
cjF0QqpbRc5fswiNI9e7Y5b2E2R1kiSo6qduSB323ejYf0tQHAI=
=Lnir
-----END PGP SIGNATURE-----
--- a/planet/scrub.py	2016-02-17 00:00:00.000000000 +0100
+++ b/planet/scrub.py	2016-11-27 13:47:47.000000000 +0100
@@ -139,12 +139,12 @@
                         node['type']='text/html'
 
                 if not doc:
-                    from html5lib import html5parser, treebuilders, sanitizer
-                    p=html5parser.HTMLParser(tree=treebuilders.getTreeBuilder('dom'), tokenizer=sanitizer.HTMLSanitizer)
-                    doc = p.parseFragment(node['value'], encoding='utf-8')
+                    from html5lib import html5parser, treebuilders
+                    p=html5parser.HTMLParser(tree=treebuilders.getTreeBuilder('dom'))
+                    doc = p.parseFragment(node['value'])
 
                 from html5lib import treewalkers, serializer
                 walker = treewalkers.getTreeWalker('dom')(doc)
-                xhtml = serializer.HTMLSerializer(inject_meta_charset = False)
+                xhtml = serializer.HTMLSerializer(inject_meta_charset = False, sanitize=True)
                 tree = xhtml.serialize(walker, encoding='utf-8')
                 node['value'] = ''.join([str(token) for token in tree])
--- a/planet/reconstitute.py	2016-02-17 00:00:00.000000000 +0100
+++ b/planet/reconstitute.py	2016-11-27 13:47:50.000000000 +0100
@@ -167,7 +167,7 @@
 
     if detail.type.find('xhtml')<0 or bozo:
         parser = html5parser.HTMLParser(tree=treebuilders.getTreeBuilder('dom'))
-        html = parser.parse(xdiv % detail.value, encoding="utf-8")
+        html = parser.parse(xdiv % detail.value, override_encoding="utf-8")
         for body in html.documentElement.childNodes:
             if body.nodeType != Node.ELEMENT_NODE: continue
             if body.nodeName != 'body': continue

Reply via email to