Status: New
Owner: ----
New issue 210 by r.kin...@gmail.com: Sanitizer and lxml tree walker:
TypeError: unhashable type
http://code.google.com/p/html5lib/issues/detail?id=210
What steps will reproduce the problem?
from html5lib import HTMLParser
from html5lib.treebuilders import getTreeBuilder
from html5lib.treewalkers import getTreeWalker
from html5lib.filters.sanitizer import Filter as Sanitizer
html = "<html><body><h1>Header"
parser = HTMLParser(tree = getTreeBuilder("lxml"),
namespaceHTMLElements = False)
doc = parser.parse(html)
root = doc.getroot()
body = doc.xpath('/html/body')
walker = getTreeWalker('lxml')
stream = walker(body)
stream = Sanitizer(stream)
for token in stream:
print token
What is the expected output? What do you see instead?
I do not know exactly what should be printed. Instead, an exception is
raised:
$ python t.py
{'namespace': u'None', 'type': 'Characters', 'data': u'<body>'}
Traceback (most recent call last):
File "t.py", line 17, in <module>
for token in stream:
File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/sanitizer.py",
line 7, in __iter__
token = self.sanitize_token(token)
File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/sanitizer.py",
line 171, in sanitize_token
token["data"][::-1]
TypeError: unhashable type
Please provide any additional information below.
the faulty token is:
{'namespace': u'None ',' type ':' StartTag ',' name ': u'h1', 'data': {}}
--
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss@googlegroups.com.
To unsubscribe from this group, send email to
html5lib-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB.