My suggestion would be to modify HTMLParser to do the job. Don't think it's very difficult. I'm unaware of any existing HTML Parsers which support that functionality...
Regards, Kelvin -------- The book giving manifesto - http://how.to/sharethisbook On Thu, 30 Jan 2003 10:56:50 +0100, Michael Wechner said: >Hi > >I am looking for an HTMLParser which skips text tagged by > ><no-index> or something similar. This way I could exclude for >instance a "global navigation section" within the HTML > ><no-index> International<br> Business<br> Science<br> ... ></no-index> > >It seems that the current demo/HTMLParser >(http://lucene.sourceforge.net/cgi- >bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q11) is not >capable of doing something like that. > >Any pointers are very welcome. > >Thanks a lot > >Michael > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
