This is a question similar to the Meta-Tags question posted to the list earlier today (or was it yesterday?). Lucene distribution includes a few simple applications that demonstrate what Lucene is capable of and how it can be used. But those demos are not relaly a part of Lucene code. It is up to you to write the application around Lucene, which in your case would include HTML parsing.
Perhaps JTidy (http://jtidy.sf.net/) could come handy here... Otis --- Emmanuel Bridonneau <[EMAIL PROTECTED]> wrote: > I am confused about how Lucene performs the parsing of an Html > document. It > doesn't do any tag striping (or does it?) consequently does that mean > it > also indexes all html tags? If so then a request for searching "body" > will > return any and all html documents previously indexed. > I'd appreciate anyone would could shed some light on the FAQ.10 about > indexing? > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do You Yahoo!? Find the one for you at Yahoo! Personals http://personals.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
