On Sat, 2002-02-09 at 07:58, Kelvin Tan wrote: > Here it is. Released under APL (I kinda copied and pasted the license from > some Fulcrum code). Some (current) limitations: > > 1. Only a single datasource is supported at this point in time (support for > multiple datasources can be easily added through the configuration file and > improving SearchConfiguration) > 2. Documentation isn't really complete. (Is it ever?) > 3. It's a filesystem-based indexer. It's not too difficult to decouple the > filesystem bit and make it more generic, but I don't have a need for it > presently. > 4. A temp folder is needed for extracting Zip, GZip and Tar files. I tried > using outputstreams but they turned out to be quite a nightmare...
great I'll take a look at all of this when I get back next week (going to Boston for a week, will be out of touch.) > 5. There's a JDBCDatasource for indexing a table from databases (the table > stores metadata of the file to index. There should still be some way to > obtain the file to index. This ties back to 3.). I really ought to provide > an example on how to use it... > What's that good for...? Wouldn't one just create an index on the database? > Questions and feedback are really welcome. > > I've attached the source-only version, but there's a full version (with > libs) at http://www.relevanz.com/search_full.zip. > > ----- Original Message ----- > From: Andrew C. Oliver <[EMAIL PROTECTED]> > To: Lucene Developers List <[EMAIL PROTECTED]> > Sent: Friday, February 08, 2002 9:18 PM > Subject: Re: Proposal for Lucene > > > > Is this open source? APL'd? Where can I look at it? > > > > -Andy > > > > On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote: > > > Great suggestions all around, and I'm pretty much in agreement with > what's been said. > > > > > > In my app, I've built a mini-framework around the searching such that > I'm able to map ContentHandlers (which index file contents) to file > extensions. I've been wanting to clean it up and contribute it for awhile, > but haven't overcome the intertia to do so. Also introduced a DataSource > (which can pretty much be anything, like a filesystem, a database, a URL, > etc) from which to obtain the data to index, so I think it _could_ be inline > with what some of you have in mind. > > > > > > I could also use alot of feedback with what's been done too... > > > > > > So what's the plan to move forward? > > > > > > K > > > ----- Original Message ----- > > > From: Mark Tucker > > > To: Lucene Developers List > > > Sent: Friday, February 08, 2002 4:03 AM > > > Subject: RE: Proposal for Lucene > > > > > > > > > I like what you included in your proposal and suggest doing all that > (over time) and taking the following into consideration: > > > > > > Indexers/Crawlers > > > > > > General Settings > > > SleeptimeBetweenCalls - can be used to avoid flooding a machine with > too many requests > > > IndexerTimeout - kill this crawler thread after long period of > inactivity > > > IncludeFilter - include only items matching filter > > > ExcludeFilter - exclude items matching filter (can be used with > IncludeFilter) > > > MaxItems - stops indexing after x items > > > MaxMegs - stops indexing after x MB of data > > > > > > File System Indexer > > > URLReplacePrefix - can crawl c:\ but expose URL as > http://mysever/docs/ > > > > > > Web Indexer > > > HTTPUser > > > HTTPPassword > > > HTTPUserAgent > > > ProxyServer > > > ProxyUser > > > ProxyPassword > > > HTTPSCertificate > > > HTTPSPrivateKey > > > > > > Other Possible Indexers > > > Microsoft Exchange 5.5/2000 > > > Lotus Notes > > > Newsgroup (NNTP) > > > Documentum > > > ODBC/OLEDB > > > XML - index single XML that represents multiple documents > > > > > > > > > Document Factory > > > General > > > The minimum properties for each document should be: > > > URL > > > Title > > > Abstract > > > Full Text > > > Score > > > > > > HTML > > > Support for META tags including Dublic Core syntax > > > > > > Other Possible Document Factories > > > Office Docs - DOC, XLS, PPT > > > PDF > > > > > > > > > Thanks for the great proposal. > > > > > > Mark Tucker > > > > > > > > > -----Original Message----- > > > From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] > > > Sent: Thursday, February 07, 2002 5:35 AM > > > To: Lucene Developers List > > > Subject: Proposal for Lucene > > > > > > > > > Hi All, > > > > > > This is just a few thoughts about Lucene. Please send me your > feedback, > > > critiques and thought. > > > > > > If you folks would take a look: > > > > > > http://www.trilug.org/~acoliver/luceneplan.html > > > > > > if you'd like to submit patches: > > > > > > http://www.trilug.org/~acoliver/luceneplan.xml > > > > > > Once I've gotten feedback from the developer community I'll send this > to > > > the user community as well. > > > > > > Thanks, > > > > > > Andy > > > -- > > > www.superlinksoftware.com > > > www.sourceforge.net/projects/poi - port of Excel format to java > > > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > > > - fix java generics! > > > > > > > > > The avalanche has already started. It is too late for the pebbles to > > > vote. > > > -Ambassador Kosh > > > > > > > > > -- > > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > > > > > -- > > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > -- > > www.superlinksoftware.com > > www.sourceforge.net/projects/poi - port of Excel format to java > > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > > - fix java generics! > > > > > > The avalanche has already started. It is too late for the pebbles to > > vote. > > -Ambassador Kosh > > > > > > -- > > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > > > > > ---- > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- www.superlinksoftware.com www.sourceforge.net/projects/poi - port of Excel format to java http://developer.java.sun.com/developer/bugParade/bugs/4487555.html - fix java generics! The avalanche has already started. It is too late for the pebbles to vote. -Ambassador Kosh -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>