Here it is. Released under APL (I kinda copied and pasted the license from some Fulcrum code). Some (current) limitations:
1. Only a single datasource is supported at this point in time (support for multiple datasources can be easily added through the configuration file and improving SearchConfiguration) 2. Documentation isn't really complete. (Is it ever?) 3. It's a filesystem-based indexer. It's not too difficult to decouple the filesystem bit and make it more generic, but I don't have a need for it presently. 4. A temp folder is needed for extracting Zip, GZip and Tar files. I tried using outputstreams but they turned out to be quite a nightmare... 5. There's a JDBCDatasource for indexing a table from databases (the table stores metadata of the file to index. There should still be some way to obtain the file to index. This ties back to 3.). I really ought to provide an example on how to use it... Questions and feedback are really welcome. I've attached the source-only version, but there's a full version (with libs) at http://www.relevanz.com/search_full.zip. ----- Original Message ----- From: Andrew C. Oliver <[EMAIL PROTECTED]> To: Lucene Developers List <[EMAIL PROTECTED]> Sent: Friday, February 08, 2002 9:18 PM Subject: Re: Proposal for Lucene > Is this open source? APL'd? Where can I look at it? > > -Andy > > On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote: > > Great suggestions all around, and I'm pretty much in agreement with what's been said. > > > > In my app, I've built a mini-framework around the searching such that I'm able to map ContentHandlers (which index file contents) to file extensions. I've been wanting to clean it up and contribute it for awhile, but haven't overcome the intertia to do so. Also introduced a DataSource (which can pretty much be anything, like a filesystem, a database, a URL, etc) from which to obtain the data to index, so I think it _could_ be inline with what some of you have in mind. > > > > I could also use alot of feedback with what's been done too... > > > > So what's the plan to move forward? > > > > K > > ----- Original Message ----- > > From: Mark Tucker > > To: Lucene Developers List > > Sent: Friday, February 08, 2002 4:03 AM > > Subject: RE: Proposal for Lucene > > > > > > I like what you included in your proposal and suggest doing all that (over time) and taking the following into consideration: > > > > Indexers/Crawlers > > > > General Settings > > SleeptimeBetweenCalls - can be used to avoid flooding a machine with too many requests > > IndexerTimeout - kill this crawler thread after long period of inactivity > > IncludeFilter - include only items matching filter > > ExcludeFilter - exclude items matching filter (can be used with IncludeFilter) > > MaxItems - stops indexing after x items > > MaxMegs - stops indexing after x MB of data > > > > File System Indexer > > URLReplacePrefix - can crawl c:\ but expose URL as http://mysever/docs/ > > > > Web Indexer > > HTTPUser > > HTTPPassword > > HTTPUserAgent > > ProxyServer > > ProxyUser > > ProxyPassword > > HTTPSCertificate > > HTTPSPrivateKey > > > > Other Possible Indexers > > Microsoft Exchange 5.5/2000 > > Lotus Notes > > Newsgroup (NNTP) > > Documentum > > ODBC/OLEDB > > XML - index single XML that represents multiple documents > > > > > > Document Factory > > General > > The minimum properties for each document should be: > > URL > > Title > > Abstract > > Full Text > > Score > > > > HTML > > Support for META tags including Dublic Core syntax > > > > Other Possible Document Factories > > Office Docs - DOC, XLS, PPT > > PDF > > > > > > Thanks for the great proposal. > > > > Mark Tucker > > > > > > -----Original Message----- > > From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, February 07, 2002 5:35 AM > > To: Lucene Developers List > > Subject: Proposal for Lucene > > > > > > Hi All, > > > > This is just a few thoughts about Lucene. Please send me your feedback, > > critiques and thought. > > > > If you folks would take a look: > > > > http://www.trilug.org/~acoliver/luceneplan.html > > > > if you'd like to submit patches: > > > > http://www.trilug.org/~acoliver/luceneplan.xml > > > > Once I've gotten feedback from the developer community I'll send this to > > the user community as well. > > > > Thanks, > > > > Andy > > -- > > www.superlinksoftware.com > > www.sourceforge.net/projects/poi - port of Excel format to java > > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > > - fix java generics! > > > > > > The avalanche has already started. It is too late for the pebbles to > > vote. > > -Ambassador Kosh > > > > > > -- > > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > > > > > -- > > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > > > > > > -- > www.superlinksoftware.com > www.sourceforge.net/projects/poi - port of Excel format to java > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > - fix java generics! > > > The avalanche has already started. It is too late for the pebbles to > vote. > -Ambassador Kosh > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > >
search.zip
Description: Zip archive
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>