On Jul 29, 2011, at 3:00 PM, Matt wrote:

> I don't see any real reason to drop a decent piece of code (HTMLParser, that 
> is) in favor of a third party library when only relatively minor updates are 
> needed to bring it up to speed with the latest spec.

I am not really one to throw stones here, as Twisted contains a lenient 
pseudo-XML parser which I still maintain - one which decidedly does not agree 
with html5's requirements for dealing with invalid data, but just a bunch of 
ad-hoc guesses of my own.

My impression of HTML5 is that HTMLParser would require significant 
modifications and possibly a drastic re-architecture in order to really do 
HTML5 "right"; especially the parts that the html5lib authors claim makes HTML5 
streaming-unfriendly, i.e. subtree reordering when encountering certain types 
of invalid data.

But if I'm wrong about that, and there are just a few spec updates and bugfixes 
that need to be applied, by all means, ignore my comment.

-glyph


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to