Yeah, I was using HTMLParser for a few days until I tried to parse a 400K document and it spun at 100% CPU for a very long time. It is tolerant of bad HTML, but does not appear to scale. TagSoup processed the same document in a second or less at <25% CPU.

-Mike

At 02:42 PM 9/22/2003 +0200, you wrote:

TagSoup is great - however, it is not maintained nor developed (the same could be said about JTidy as well, but TagSoup's history is much shorter...). I'm using HTMLParser (http://htmlparser.sourceforge.net) for my application, and it also works very well, even for ill-formed input. It's also very actively developed.

--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to