I like what you included in your proposal and suggest doing all that (over time) and taking the following into consideration:
Indexers/Crawlers General Settings SleeptimeBetweenCalls - can be used to avoid flooding a machine with too many requests IndexerTimeout - kill this crawler thread after long period of inactivity IncludeFilter - include only items matching filter ExcludeFilter - exclude items matching filter (can be used with IncludeFilter) MaxItems - stops indexing after x items MaxMegs - stops indexing after x MB of data File System Indexer URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/ Web Indexer HTTPUser HTTPPassword HTTPUserAgent ProxyServer ProxyUser ProxyPassword HTTPSCertificate HTTPSPrivateKey Other Possible Indexers Microsoft Exchange 5.5/2000 Lotus Notes Newsgroup (NNTP) Documentum ODBC/OLEDB XML - index single XML that represents multiple documents Document Factory General The minimum properties for each document should be: URL Title Abstract Full Text Score HTML Support for META tags including Dublic Core syntax Other Possible Document Factories Office Docs - DOC, XLS, PPT PDF Thanks for the great proposal. Mark Tucker -----Original Message----- From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 07, 2002 5:35 AM To: Lucene Developers List Subject: Proposal for Lucene Hi All, This is just a few thoughts about Lucene. Please send me your feedback, critiques and thought. If you folks would take a look: http://www.trilug.org/~acoliver/luceneplan.html if you'd like to submit patches: http://www.trilug.org/~acoliver/luceneplan.xml Once I've gotten feedback from the developer community I'll send this to the user community as well. Thanks, Andy -- www.superlinksoftware.com www.sourceforge.net/projects/poi - port of Excel format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>