Status: Accepted
Owner: geoffers
Labels: Type-Defect Python
New issue 202 by geoffers: Unicode file breaks InputStream
http://code.google.com/p/html5lib/issues/detail?id=202
What steps will reproduce the problem?
import html5lib, StringIO
html5lib.parse(StringIO.StringIO(u"a"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "html5lib/html5parser.py", line 54, in parse
return p.parse(doc, encoding=encoding)
File "html5lib/html5parser.py", line 247, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "html5lib/html5parser.py", line 110, in _parse
parser=self, **kwargs)
File "html5lib/tokenizer.py", line 42, in __init__
self.stream = HTMLInputStream(stream, encoding, parseMeta, useChardet)
File "html5lib/inputstream.py", line 162, in __init__
self.charEncoding = self.detectEncoding(parseMeta, chardet)
File "html5lib/inputstream.py", line 217, in detectEncoding
encoding = self.detectBOM()
File "html5lib/inputstream.py", line 282, in detectBOM
assert isinstance(string, str)
AssertionError
In short, we don't handle the case where we get given a file-like object
that returns Unicode strings.
--
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss@googlegroups.com.
To unsubscribe from this group, send email to
html5lib-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB.