Chris Fraschetti wrote:
I've seen throughout the list mentions of millions of documents.. 8
million, 20 million, etc etc.. but can lucene potentially handle
billions of documents and still efficiently search through them?

Lucene can currently handle up to 2^31 documents in a single index. To a large degree this is limited by Java ints and arrays (which are accessed by ints). There are also a few places where the file format limits things to 2^32.


On typical PC hardware, 2-3 word searches of an index with 10M documents, each with around 10k of text, require around 1 second, including index i/o time. Performance is more-or-less linear, so that a 100M document index might require nearly 10 seconds per search. Thus, as indexes grow folks tend to distribute searches in parallel to many smaller indexes. That's what Nutch and Google (http://www.computer.org/micro/mi2003/m2022.pdf) do.

Doug


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to