I'm developing it for a book I'm writing on Ant, and I've posted one piece of it here already - my HtmlDocument class that uses JTidy to DOM'ify the HTML and rip out the title and body contents as two separate fields (without HTML tags, of course).
I have every intention of giving all the code developed to Lucene or other Jakarta projects where appropriate. I only haven't yet because its still under development - its not top secret or anything. :) The Ant task definitely deserves some additional Lucene expertise to make sure its doing the right thing, but I have it checking dependencies by embedding a non-indexed "last modified" field into the Lucene index too which it checks before actually indexing a document again - so a second incremental run of indexing is *much* faster since it skips files unless they are newer. Erik ----- Original Message ----- From: "Andrew C. Oliver" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Friday, February 08, 2002 8:19 AM Subject: Re: Proposal for Lucene > Is this open source? APL'd? Where can I look at it? > > On Thu, 2002-02-07 at 22:00, Erik Hatcher wrote: > > I've developed something similar myself. I've created an Ant task <index> > > that uses DocumentHandler interface implementing classes - one that can be > > used (<index class="...">) is a FileExtensionDocumentHandler. At build-time > > I generate a Lucene index of static documents, and roll that into a web > > application. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>