I have used html5lib in my project.  It runs great except a minor
possible error.
I got the following error message:
  File "myfolder\parser.py", line 30, in parser
    minidom_document = parser.parse(fp)
  File "build\bdist.win32\egg\html5lib\html5parser.py", line 144, in
parse
  File "build\bdist.win32\egg\html5lib\html5parser.py", line 116, in
_parse
  File "build\bdist.win32\egg\html5lib\tokenizer.py", line 98, in
__iter__
  File "build\bdist.win32\egg\html5lib\tokenizer.py", line 333, in
dataState
  File "build\bdist.win32\egg\html5lib\inputstream.py", line 282, in
charsUntil
  File "build\bdist.win32\egg\html5lib\inputstream.py", line 259, in
readChunk
IndexError: string index out of range

I think it is because the following code:

        if (self._lastChunkEndsWithCR and data[0] == "\n"):
            data = data[1:]
        self._lastChunkEndsWithCR = data[-1] == "\r"

if the data only contains a single "\n" and self._lastChunkEndsWithCR
happens to be True, then data would be "" after the first two lines.
So data[-1] would then raise an exception.

I have added the following code after the second line and the bug
vanished:

        if not data:
                return


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to [email protected]
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

Reply via email to