"Ning Li" <[EMAIL PROTECTED]> wrote: > > It would merge based on size (not # docs), would be free to merge > > adjacent segments (not just rightmost segments), and would merge N > > (configurable) at a time. The part that's still unclear is how it > > chooses when to "trigger" a merge and how specifically it picks which > > N segments to merge (maybe: the series of N adjacent segments that are > > "most similar" in size, but favoring smaller segments over larger > > ones). > > Those two are very good questions. It's a challenge to make it work in > all case. One example is the sandwich case, where two large segments > sandwich a small one. I'll think about it... It'd be even better if we > can take deletes into consideration: it's more beneficial to merge a > segment with more deletes. Right now, we have to open an IndexReader > to get the number of deletes. We could store that in segments file if > we decide IndexWriter/MergePolicy will need that...
Yes the sandwich case would be challenging, though, how would you get to the sandwich case in the first place? I guess if RAM had flushed that way; or if many deletes accumulated on the middle one. But I don't think merging would tend to produce sandwich cases itself (since it would have merged that middle one). I like your idea to keep "delete count per segment" in the segments file. This information is certainly useful to the merge policy because it should proportionally reducde a segments size according to what %tg of its docs are deleted, and, it should favor merging segments with high # deletes to free up the storage. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]