Greg,

On Wednesday 28 April 2004 21:44, Greg Conway wrote:
> Hello.  Apologies if this has come up before, I'm new to the list and
> didn't see anything in the archives that exactly matched my situation.

It has, but each situation is different. Try this:
http://jakarta.apache.org/lucene/docs/benchmarks.html

> I am considering using Lucene to index and search a large collection of
> small documents in a  specialized domain -- probably only a few
>
> thousands unique terms spanning across anywhere from one million to ten
> million small source documents.  I hope to be able to get ranked search
> results back in less than 400 msec.
>
> I suspect one issue I may face is index density owing to the large
> numbers of documents and relatively small vocabulary.  That, in turn,
> may be a drag on query processing.  I am working on strategies to
> ameliorate that somewhat but it may be difficult.

A text search engine is your best bet in this situation.

> In the meantime, I'm looking for some gut reactions from the experts
> before I take this to the next stage.  Can Lucene scale well to this
> kind of situation?  Can I realistically hope to get anywhere near my

Yes.

> performance targets?  Will I have to distribute pieces of the index

Yes.

> across several machines,  parallelize my retrievals, and merge the

That's more difficult to say. You'll need to try.

> results to do so?  If so, does Lucene already support that or will I

Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search.
(See the javadoc on the website)
But first make sure that the Analyzer you use for indexing fits your needs.

> have to develop that logic in house?  (Seems like I saw a reference

No.

> somewhere that such a feature was coming soon, but I'm not sure when or
> how it will be implemented.)

Have fun,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to