I've developed something similar myself. I've created an Ant task <index> that uses DocumentHandler interface implementing classes - one that can be used (<index class="...">) is a FileExtensionDocumentHandler. At build-time I generate a Lucene index of static documents, and roll that into a web application.
Its got some kinks, like how to deal with the documents because they contain relative hyperlinks... so these documents either should be copied into the WAR too (or somehow made accessible to the web app) or incorporated directly into a Lucene field ("rawcontents" is what I'm using now). These issues are not tough to solve and having some additional parameters to my IndexTask could allow such things to be customized by the user. My task is still evolving, but my plan all along has been to donate it to lucene-dev for incorporation in some form or another. Let me know if you'd like it, and what package name you'd like to use. Erik ----- Original Message ----- From: "Kelvin Tan" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Thursday, February 07, 2002 8:27 PM Subject: Re: Proposal for Lucene Great suggestions all around, and I'm pretty much in agreement with what's been said. In my app, I've built a mini-framework around the searching such that I'm able to map ContentHandlers (which index file contents) to file extensions. I've been wanting to clean it up and contribute it for awhile, but haven't overcome the intertia to do so. Also introduced a DataSource (which can pretty much be anything, like a filesystem, a database, a URL, etc) from which to obtain the data to index, so I think it _could_ be inline with what some of you have in mind. I could also use alot of feedback with what's been done too... So what's the plan to move forward? K ----- Original Message ----- From: Mark Tucker To: Lucene Developers List Sent: Friday, February 08, 2002 4:03 AM Subject: RE: Proposal for Lucene I like what you included in your proposal and suggest doing all that (over time) and taking the following into consideration: Indexers/Crawlers General Settings SleeptimeBetweenCalls - can be used to avoid flooding a machine with too many requests IndexerTimeout - kill this crawler thread after long period of inactivity IncludeFilter - include only items matching filter ExcludeFilter - exclude items matching filter (can be used with IncludeFilter) MaxItems - stops indexing after x items MaxMegs - stops indexing after x MB of data File System Indexer URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/ Web Indexer HTTPUser HTTPPassword HTTPUserAgent ProxyServer ProxyUser ProxyPassword HTTPSCertificate HTTPSPrivateKey Other Possible Indexers Microsoft Exchange 5.5/2000 Lotus Notes Newsgroup (NNTP) Documentum ODBC/OLEDB XML - index single XML that represents multiple documents Document Factory General The minimum properties for each document should be: URL Title Abstract Full Text Score HTML Support for META tags including Dublic Core syntax Other Possible Document Factories Office Docs - DOC, XLS, PPT PDF Thanks for the great proposal. Mark Tucker -----Original Message----- From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 07, 2002 5:35 AM To: Lucene Developers List Subject: Proposal for Lucene Hi All, This is just a few thoughts about Lucene. Please send me your feedback, critiques and thought. If you folks would take a look: http://www.trilug.org/~acoliver/luceneplan.html if you'd like to submit patches: http://www.trilug.org/~acoliver/luceneplan.xml Once I've gotten feedback from the developer community I'll send this to the user community as well. Thanks, Andy -- www.superlinksoftware.com www.sourceforge.net/projects/poi - port of Excel format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>