Chris Fraschetti wrote:
I've seen throughout the list mentions of millions of documents.. 8
million, 20 million, etc etc.. but can lucene potentially handle
billions of documents and still efficiently search through them?

Lucene can currently handle up to 2^31 documents in a single index. To a large degree this is limited by Java ints and arrays (which are accessed by ints). There are also a few places where the file format limits things to 2^32.

On typical PC hardware, 2-3 word searches of an index with 10M documents, each with around 10k of text, require around 1 second, including index i/o time. Performance is more-or-less linear, so that a 100M document index might require nearly 10 seconds per search. Thus, as indexes grow folks tend to distribute searches in parallel to many smaller indexes. That's what Nutch and Google ( do.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to