"Ning Li" <[EMAIL PROTECTED]> wrote: > On 3/31/07, Michael McCandless (JIRA) <[EMAIL PROTECTED]> wrote: > > Create merge policy that doesn't periodically inadvertently optimize > > -------------------------------------------------------------------- > > So we could make a small change to the policy by only merging the > > first mergeFactor segments once we hit 2X the merge factor. With > > mergeFactor=10, when we have created the 20th level 0 (just flushed) > > segment, we merge the first 10 into a level 1 segment. Then on > > creating another 10 level 0 segments, we merge the second set of 10 > > level 0 segments into a level 1 segment, etc. > > Hi Mike, > > When a 20th level 0 segment triggers a 20th level 1 segment which > triggers a 20th level 2 segment... we are still optimizing, aren't we? > Am I missing something here?
The merge would "cascade" in this case, but, would not optimize (you will have > 1 segments in the end). Each time you cascade you only merge the first 10 at each level, so after cascading you would have 1 level 3 segment, 10 level 2 segments, 10 level 1 segments and 10 level 0 segments. I'm actually using this merge policy in my patch for LUCENE-843 when merging the flushed "partial" segments. This is only used when IndexWriter is opened with autoCommit=false, and, you add lots and lots of documents (so RAM flushes many times). But, I like the proposed merge policy at the end of LUCENE-845 even better for Lucene's normal merges. It would merge based on size (not # docs), would be free to merge adjacent segments (not just rightmost segments), and would merge N (configurable) at a time. The part that's still unclear is how it chooses when to "trigger" a merge and how specifically it picks which N segments to merge (maybe: the series of N adjacent segments that are "most similar" in size, but favoring smaller segments over larger ones). Mike -- Michael McCandless [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]