-----Original Message----- From: Leo Galambos [mailto:[EMAIL PROTECTED]] Sent: Saturday, December 21, 2002 9:36 AM To: Lucene Users List Subject: Re: Lucene Benchmarks and Information [snip]
>IMHO it is a bug and the >point why Lucene does not scale well on huge collections of documents. I >am talking about my previous tests when I used live index and concurrent >query+insert+delete (I wanted to simulate real application). [snip] What is your definition of huge? I have yet to have a problem, and I am running one of the biggest indexes that I have seen posted to the mailing list. I've been very impressed with the way that lucene scales. Apparently I was not on the mailing list when you posted these tests. (I'm still fairly new) >BTW, your mail is also an answer to previous topic "how often could one >call optimize()". The method would be called before the index goes to >production state. And it also means that tests are irrelevant until they >are made with lower mergeFactor. [snip] Maybe "irrelevant" to you, but I didn't intend my exercise to be a benchmark as to how fast I could make Lucene Index, as there are a lot of things that I could have done to make it faster. (And I ended up learning several more via the experiment and follow up discussion here) Maybe "Benchmarks" is a bad word to have in the subject. They were done so that A. So I know that there is no limitation (that will affect me) in Lucene (Hardcoded, bug, or designwise) as to how many documents can be put into an index. That's why I built this ~43 million document index. Just to see if I could. B. I know the impact on search times of adding more documents C. I know I can search this size of an index without running into problems. I would imagine any benchmark that says I can index x documents this fast is fairly irrelevant to anyone else using different hardware, as it varies too much based on disk speed, platform, cpu, doc size, doc format (in my real apps I'm doing xml transformations), how dedicated the machine is, jvm, etc etc etc. The results were posted to the list so that the question "I just found Lucene. It looks nice, but can it handle 30 (or more) million documents?" can be answered matter of factly to others in the future. Additionally, it serves as a *very* rough guide to the amount of hardware you would need to construct your index of X documents in Y amount of time. Dan -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
