Status: New
Owner: ----

New issue 210 by r.kin...@gmail.com: Sanitizer and lxml tree walker: TypeError: unhashable type
http://code.google.com/p/html5lib/issues/detail?id=210

What steps will reproduce the problem?

from html5lib import HTMLParser
from html5lib.treebuilders import getTreeBuilder
from html5lib.treewalkers import getTreeWalker
from html5lib.filters.sanitizer import Filter as Sanitizer
html = "<html><body><h1>Header"

parser = HTMLParser(tree = getTreeBuilder("lxml"),
        namespaceHTMLElements = False)
doc = parser.parse(html)
root = doc.getroot()
body = doc.xpath('/html/body')
walker = getTreeWalker('lxml')
stream = walker(body)
stream = Sanitizer(stream)
for token in stream:
    print token


What is the expected output? What do you see instead?

I do not know exactly what should be printed. Instead, an exception is raised:

$ python t.py
{'namespace': u'None', 'type': 'Characters', 'data': u'<body>'}
Traceback (most recent call last):
  File "t.py", line 17, in <module>
    for token in stream:
File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/sanitizer.py", line 7, in __iter__
    token = self.sanitize_token(token)
File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/sanitizer.py", line 171, in sanitize_token
    token["data"][::-1]
TypeError: unhashable type


Please provide any additional information below.

the faulty token is:
{'namespace': u'None ',' type ':' StartTag ',' name ': u'h1', 'data': {}}


--
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss@googlegroups.com.
To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB.

Reply via email to