Greg, On Wednesday 28 April 2004 21:44, Greg Conway wrote: > Hello. Apologies if this has come up before, I'm new to the list and > didn't see anything in the archives that exactly matched my situation.
It has, but each situation is different. Try this: http://jakarta.apache.org/lucene/docs/benchmarks.html > I am considering using Lucene to index and search a large collection of > small documents in a specialized domain -- probably only a few > > thousands unique terms spanning across anywhere from one million to ten > million small source documents. I hope to be able to get ranked search > results back in less than 400 msec. > > I suspect one issue I may face is index density owing to the large > numbers of documents and relatively small vocabulary. That, in turn, > may be a drag on query processing. I am working on strategies to > ameliorate that somewhat but it may be difficult. A text search engine is your best bet in this situation. > In the meantime, I'm looking for some gut reactions from the experts > before I take this to the next stage. Can Lucene scale well to this > kind of situation? Can I realistically hope to get anywhere near my Yes. > performance targets? Will I have to distribute pieces of the index Yes. > across several machines, parallelize my retrievals, and merge the That's more difficult to say. You'll need to try. > results to do so? If so, does Lucene already support that or will I Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search. (See the javadoc on the website) But first make sure that the Analyzer you use for indexing fits your needs. > have to develop that logic in house? (Seems like I saw a reference No. > somewhere that such a feature was coming soon, but I'm not sure when or > how it will be implemented.) Have fun, Ype --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
