RE: Lucene Benchmarks and Information

Armbrust, Daniel C. Mon, 23 Dec 2002 12:59:58 -0800

-----Original Message-----
From: Leo Galambos [mailto:[EMAIL PROTECTED]] 
Sent: Saturday, December 21, 2002 9:36 AM
To: Lucene Users List
Subject: Re: Lucene Benchmarks and Information
[snip]


>IMHO it is a bug and the
>point why Lucene does not scale well on huge collections of documents. I
>am talking about my previous tests when I used live index and concurrent
>query+insert+delete (I wanted to simulate real application).

[snip]

What is your definition of huge?  I have yet to have a problem, and I am running one 
of the biggest indexes that I have seen posted to the mailing list.  I've been very 
impressed with the way that lucene scales.  Apparently I was not on the mailing list 
when you posted these tests.  (I'm still fairly new)


>BTW, your mail is also an answer to previous topic "how often could one
>call optimize()". The method would be called before the index goes to
>production state. And it also means that tests are irrelevant until they
>are made with lower mergeFactor.

[snip]

Maybe "irrelevant" to you, but I didn't intend my exercise to be a benchmark as to how 
fast I could make Lucene Index, as there are a lot of things that I could have done to 
make it faster.  (And I ended up learning several more via the experiment and follow 
up discussion here)  Maybe "Benchmarks" is a bad word to have in the subject.  They 
were done so that 

A.  So I know that there is no limitation (that will affect me) in Lucene (Hardcoded, 
bug, or designwise) as to how many documents can be put into an index.  That's why I 
built this ~43 million document index.  Just to see if I could.

B.  I know the impact on search times of adding more documents

C.  I know I can search this size of an index without running into problems.


I would imagine any benchmark that says I can index x documents this fast is fairly 
irrelevant to anyone else using different hardware, as it varies too much based  on 
disk speed, platform, cpu, doc size, doc format (in my real apps I'm doing xml 
transformations), how dedicated the machine is, jvm, etc etc etc.  

The results were posted to the list so that the question 

"I just found Lucene.  It looks nice, but can it handle 30 (or more) million 
documents?"

can be answered matter of factly to others in the future.  Additionally, it serves as 
a *very* rough guide to the amount of hardware you would need to construct your index 
of X documents in Y amount of time.

Dan

 

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: Lucene Benchmarks and Information

Reply via email to