Hello,I have some good experiences with JTidy. It works like DOM-XML parser and cleans HTML it by the way.
i'm trying to index html file with Lucene.
Do u know what's the best HTML Parser in Java ? The most Powerful ?
I need to extract meta-tag, and many other differents text fields...
Thx for ur help ;)
This is VERY useful, because EVERY HTML have at least ONE error.
Documents that was unparsable with Neko JTidy parsed without problems.
Creating indexing program was work for 2 hours.
-- Lukas Zapletal [EMAIL PROTECTED] http://www.tanecni-olomouc.cz/lzap
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
