Great suggestions all around, and I'm pretty much in agreement with what's been said.
In my app, I've built a mini-framework around the searching such that I'm able to map ContentHandlers (which index file contents) to file extensions. I've been wanting to clean it up and contribute it for awhile, but haven't overcome the intertia to do so. Also introduced a DataSource (which can pretty much be anything, like a filesystem, a database, a URL, etc) from which to obtain the data to index, so I think it _could_ be inline with what some of you have in mind. I could also use alot of feedback with what's been done too... So what's the plan to move forward? K ----- Original Message ----- From: Mark Tucker To: Lucene Developers List Sent: Friday, February 08, 2002 4:03 AM Subject: RE: Proposal for Lucene I like what you included in your proposal and suggest doing all that (over time) and taking the following into consideration: Indexers/Crawlers General Settings SleeptimeBetweenCalls - can be used to avoid flooding a machine with too many requests IndexerTimeout - kill this crawler thread after long period of inactivity IncludeFilter - include only items matching filter ExcludeFilter - exclude items matching filter (can be used with IncludeFilter) MaxItems - stops indexing after x items MaxMegs - stops indexing after x MB of data File System Indexer URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/ Web Indexer HTTPUser HTTPPassword HTTPUserAgent ProxyServer ProxyUser ProxyPassword HTTPSCertificate HTTPSPrivateKey Other Possible Indexers Microsoft Exchange 5.5/2000 Lotus Notes Newsgroup (NNTP) Documentum ODBC/OLEDB XML - index single XML that represents multiple documents Document Factory General The minimum properties for each document should be: URL Title Abstract Full Text Score HTML Support for META tags including Dublic Core syntax Other Possible Document Factories Office Docs - DOC, XLS, PPT PDF Thanks for the great proposal. Mark Tucker -----Original Message----- From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 07, 2002 5:35 AM To: Lucene Developers List Subject: Proposal for Lucene Hi All, This is just a few thoughts about Lucene. Please send me your feedback, critiques and thought. If you folks would take a look: http://www.trilug.org/~acoliver/luceneplan.html if you'd like to submit patches: http://www.trilug.org/~acoliver/luceneplan.xml Once I've gotten feedback from the developer community I'll send this to the user community as well. Thanks, Andy -- www.superlinksoftware.com www.sourceforge.net/projects/poi - port of Excel format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>