Yeah, I didn't play yet with millions of documents. We will need a
bigger test collection, I think! Although the benchmarker can add as
many as you want from the same source, index compression will effect
the results possibly more than a bigger collection with all unique docs.
Maybe it is time to look at adding Wikipedia as a test collection. I
think there are something like 18+ million docs in it.
On Mar 23, 2007, at 4:01 PM, Doug Cutting wrote:
Michael McCandless wrote:
Also, one caveat: whenever #docs (21578 for Reuters) divided by
maxBuffered docs is less than mergeFactor, you will have no merges
take place during your runs. This greatly skews the results.
Also, my guess is that this index fits entirely in the buffer
cache. Things behave quite differently when segments are larger
than available memory and merging requires lots of disk i/o.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]