Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

Grant Ingersoll Fri, 23 Mar 2007 12:09:51 -0800

Yeah, I didn't play yet with millions of documents. We will need abigger test collection, I think! Although the benchmarker can add asmany as you want from the same source, index compression will effectthe results possibly more than a bigger collection with all unique docs.

Maybe it is time to look at adding Wikipedia as a test collection. Ithink there are something like 18+ million docs in it.


On Mar 23, 2007, at 4:01 PM, Doug Cutting wrote:

Michael McCandless wrote:
Also, one caveat: whenever #docs (21578 for Reuters) divided by
maxBuffered docs is less than mergeFactor, you will have no merges
take place during your runs.  This greatly skews the results.
Also, my guess is that this index fits entirely in the buffercache. Things behave quite differently when segments are largerthan available memory and merging requires lots of disk i/o.
Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge

Reply via email to