Comment #2 on issue 211 by jos...@metaoptimize.com: Awful memory leak / infinite loop
http://code.google.com/p/html5lib/issues/detail?id=211

The Markdown comes from the wild and is probably invalid.

My idea was to pass the HTML through tidy before running an HTML parser, thus avoiding an infinite loop. There are several tidy wrappers in Python. I used pytidylib.

I didn't play with the options to make tidy more strict, and even after tidy, html5lib still goes into an infinite loop. So my current workaround is to use tidy followed by lxml :\

--
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
To post to this group, send an email to html5lib-discuss@googlegroups.com.
To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB.

Reply via email to