On Jul 29, 2011, at 3:00 PM, Matt wrote: > I don't see any real reason to drop a decent piece of code (HTMLParser, that > is) in favor of a third party library when only relatively minor updates are > needed to bring it up to speed with the latest spec.
I am not really one to throw stones here, as Twisted contains a lenient pseudo-XML parser which I still maintain - one which decidedly does not agree with html5's requirements for dealing with invalid data, but just a bunch of ad-hoc guesses of my own. My impression of HTML5 is that HTMLParser would require significant modifications and possibly a drastic re-architecture in order to really do HTML5 "right"; especially the parts that the html5lib authors claim makes HTML5 streaming-unfriendly, i.e. subtree reordering when encountering certain types of invalid data. But if I'm wrong about that, and there are just a few spec updates and bugfixes that need to be applied, by all means, ignore my comment. -glyph
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com