Create merge policy that doesn't periodically inadvertently optimize --------------------------------------------------------------------
Key: LUCENE-854 URL: https://issues.apache.org/jira/browse/LUCENE-854 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.2 Reporter: Michael McCandless Assigned To: Michael McCandless Priority: Minor Fix For: 2.2 The current merge policy, at every maxBufferedDocs * power-of-mergeFactor docs added, will do a fully cascaded merge, which is the same as an optimize. I think this is not good because at that "optimization poin", the particular addDocument call is [surprisingly] very expensive. While, amortized over all addDocument calls, the cost is low, the cost is paid "up front" and in a very "bunched up" manner. I think of this as "pay it forward": you are paying the full cost of an optimize right now on the expectation / hope that you will be adding a great many more docs. But, if you don't add that many more docs, then, the amortized cost for your index is in fact far higher than it should have been. Better to "pay as you go" instead. So we could make a small change to the policy by only merging the first mergeFactor segments once we hit 2X the merge factor. With mergeFactor=10, when we have created the 20th level 0 (just flushed) segment, we merge the first 10 into a level 1 segment. Then on creating another 10 level 0 segments, we merge the second set of 10 level 0 segments into a level 1 segment, etc. With this new merge policy, an index that's a bit bigger than a current "optimization point" would then have a lower amortized cost per document. Plus the merge cost is less "bunched up" and less "pay it forward": instead you pay for what you are actually using. We can start by creating this merge policy (probably, combined with with the "by size not by doc count" segment level computation from LUCENE-845) and then later decide whether we should make it the default merge policy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]