I like what you included in your proposal and suggest doing all that (over time) and 
taking the following into consideration:

Indexers/Crawlers

        General Settings
                SleeptimeBetweenCalls - can be used to avoid flooding a machine with 
too many requests
                IndexerTimeout - kill this crawler thread after long period of 
inactivity
                IncludeFilter - include only items matching filter
                ExcludeFilter - exclude items matching filter (can be used with 
IncludeFilter)
                MaxItems - stops indexing after x items
                MaxMegs - stops indexing after x MB of data

        File System Indexer
                URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/
                
        Web Indexer
                HTTPUser
                HTTPPassword
                HTTPUserAgent
                ProxyServer
                ProxyUser
                ProxyPassword
                HTTPSCertificate
                HTTPSPrivateKey

        Other Possible Indexers
                Microsoft Exchange 5.5/2000
                Lotus Notes
                Newsgroup (NNTP)
                Documentum
                ODBC/OLEDB
                XML - index single XML that represents multiple documents


Document Factory                
        General
                The minimum properties for each document should be:
                        URL
                        Title
                        Abstract
                        Full Text
                        Score

        HTML
                Support for META tags including Dublic Core syntax

        Other Possible Document Factories
                Office Docs - DOC, XLS, PPT
                PDF
                

Thanks for the great proposal.

Mark Tucker
                        

-----Original Message-----
From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 07, 2002 5:35 AM
To: Lucene Developers List
Subject: Proposal for Lucene


Hi All,

This is just a few thoughts about Lucene.  Please send me your feedback,
critiques and thought.

If you folks would take a look:

http://www.trilug.org/~acoliver/luceneplan.html

if you'd like to submit patches:

http://www.trilug.org/~acoliver/luceneplan.xml

Once I've gotten feedback from the developer community I'll send this to
the user community as well.

Thanks,

Andy
-- 
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
                        - fix java generics!


The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to