----- Original Message ----- From: "Andrew C. Oliver" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Monday, February 25, 2002 12:48 AM Subject: Re: Proposal for Lucene
> Wow this is an awesome starting point! I'm awed! > The object model is > nice and abstracted and yet clean and simple.. I only scanned it but I > already feel like I understand it. Are you okay with us putting this in > a scratchpad area in lucene repository (I gather "yes") and refactoring > it as a starting point? I'd be more than happy if you could do that. It would be nice if Lucene had the equivalent of the commons-sandbox or turbine-stratum, a workplace kind-of. Regards, Kelvin > > Has anyone else looked at this? Any objections? > > -Andy > > > On Sat, 2002-02-09 at 07:58, Kelvin Tan wrote: > > Here it is. Released under APL (I kinda copied and pasted the license from > > some Fulcrum code). Some (current) limitations: > > > > 1. Only a single datasource is supported at this point in time (support for > > multiple datasources can be easily added through the configuration file and > > improving SearchConfiguration) > > 2. Documentation isn't really complete. (Is it ever?) > > 3. It's a filesystem-based indexer. It's not too difficult to decouple the > > filesystem bit and make it more generic, but I don't have a need for it > > presently. > > 4. A temp folder is needed for extracting Zip, GZip and Tar files. I tried > > using outputstreams but they turned out to be quite a nightmare... > > 5. There's a JDBCDatasource for indexing a table from databases (the table > > stores metadata of the file to index. There should still be some way to > > obtain the file to index. This ties back to 3.). I really ought to provide > > an example on how to use it... > > > > Questions and feedback are really welcome. > > > > I've attached the source-only version, but there's a full version (with > > libs) at http://www.relevanz.com/search_full.zip. > > > > ----- Original Message ----- > > From: Andrew C. Oliver <[EMAIL PROTECTED]> > > To: Lucene Developers List <[EMAIL PROTECTED]> > > Sent: Friday, February 08, 2002 9:18 PM > > Subject: Re: Proposal for Lucene > > > > > > > Is this open source? APL'd? Where can I look at it? > > > > > > -Andy > > > > > > On Thu, 2002-02-07 at 20:27, Kelvin Tan wrote: > > > > Great suggestions all around, and I'm pretty much in agreement with > > what's been said. > > > > > > > > In my app, I've built a mini-framework around the searching such that > > I'm able to map ContentHandlers (which index file contents) to file > > extensions. I've been wanting to clean it up and contribute it for awhile, > > but haven't overcome the intertia to do so. Also introduced a DataSource > > (which can pretty much be anything, like a filesystem, a database, a URL, > > etc) from which to obtain the data to index, so I think it _could_ be inline > > with what some of you have in mind. > > > > > > > > I could also use alot of feedback with what's been done too... > > > > > > > > So what's the plan to move forward? > > > > > > > > K > > > > ----- Original Message ----- > > > > From: Mark Tucker > > > > To: Lucene Developers List > > > > Sent: Friday, February 08, 2002 4:03 AM > > > > Subject: RE: Proposal for Lucene > > > > > > > > > > > > I like what you included in your proposal and suggest doing all that > > (over time) and taking the following into consideration: > > > > > > > > Indexers/Crawlers > > > > > > > > General Settings > > > > SleeptimeBetweenCalls - can be used to avoid flooding a machine with > > too many requests > > > > IndexerTimeout - kill this crawler thread after long period of > > inactivity > > > > IncludeFilter - include only items matching filter > > > > ExcludeFilter - exclude items matching filter (can be used with > > IncludeFilter) > > > > MaxItems - stops indexing after x items > > > > MaxMegs - stops indexing after x MB of data > > > > > > > > File System Indexer > > > > URLReplacePrefix - can crawl c:\ but expose URL as > > http://mysever/docs/ > > > > > > > > Web Indexer > > > > HTTPUser > > > > HTTPPassword > > > > HTTPUserAgent > > > > ProxyServer > > > > ProxyUser > > > > ProxyPassword > > > > HTTPSCertificate > > > > HTTPSPrivateKey > > > > > > > > Other Possible Indexers > > > > Microsoft Exchange 5.5/2000 > > > > Lotus Notes > > > > Newsgroup (NNTP) > > > > Documentum > > > > ODBC/OLEDB > > > > XML - index single XML that represents multiple documents > > > > > > > > > > > > Document Factory > > > > General > > > > The minimum properties for each document should be: > > > > URL > > > > Title > > > > Abstract > > > > Full Text > > > > Score > > > > > > > > HTML > > > > Support for META tags including Dublic Core syntax > > > > > > > > Other Possible Document Factories > > > > Office Docs - DOC, XLS, PPT > > > > PDF > > > > > > > > > > > > Thanks for the great proposal. > > > > > > > > Mark Tucker > > > > > > > > > > > > -----Original Message----- > > > > From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] > > > > Sent: Thursday, February 07, 2002 5:35 AM > > > > To: Lucene Developers List > > > > Subject: Proposal for Lucene > > > > > > > > > > > > Hi All, > > > > > > > > This is just a few thoughts about Lucene. Please send me your > > feedback, > > > > critiques and thought. > > > > > > > > If you folks would take a look: > > > > > > > > http://www.trilug.org/~acoliver/luceneplan.html > > > > > > > > if you'd like to submit patches: > > > > > > > > http://www.trilug.org/~acoliver/luceneplan.xml > > > > > > > > Once I've gotten feedback from the developer community I'll send this > > to > > > > the user community as well. > > > > > > > > Thanks, > > > > > > > > Andy > > > > -- > > > > www.superlinksoftware.com > > > > www.sourceforge.net/projects/poi - port of Excel format to java > > > > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > > > > - fix java generics! > > > > > > > > > > > > The avalanche has already started. It is too late for the pebbles to > > > > vote. > > > > -Ambassador Kosh > > > > > > > > > > > > -- > > > > To unsubscribe, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > > For additional commands, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > -- > > > > To unsubscribe, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > > For additional commands, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > > > > > > > > > > > > > -- > > > www.superlinksoftware.com > > > www.sourceforge.net/projects/poi - port of Excel format to java > > > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > > > - fix java generics! > > > > > > > > > The avalanche has already started. It is too late for the pebbles to > > > vote. > > > -Ambassador Kosh > > > > > > > > > -- > > > To unsubscribe, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > For additional commands, e-mail: > > <mailto:[EMAIL PROTECTED]> > > > > > > > > ---- > > > > > -- > > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > -- > http://www.superlinksoftware.com > http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document > format to java > http://developer.java.sun.com/developer/bugParade/bugs/4487555.html > - fix java generics! > The avalanche has already started. It is too late for the pebbles to > vote. > -Ambassador Kosh > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>