"Ning Li" <[EMAIL PROTECTED]> wrote:

> > It would merge based on size (not # docs), would be free to merge
> > adjacent segments (not just rightmost segments), and would merge N
> > (configurable) at a time.  The part that's still unclear is how it
> > chooses when to "trigger" a merge and how specifically it picks which
> > N segments to merge (maybe: the series of N adjacent segments that are
> > "most similar" in size, but favoring smaller segments over larger
> > ones).
> 
> Those two are very good questions. It's a challenge to make it work in
> all case. One example is the sandwich case, where two large segments
> sandwich a small one. I'll think about it... It'd be even better if we
> can take deletes into consideration: it's more beneficial to merge a
> segment with more deletes. Right now, we have to open an IndexReader
> to get the number of deletes. We could store that in segments file if
> we decide IndexWriter/MergePolicy will need that...

Yes the sandwich case would be challenging, though, how would you get
to the sandwich case in the first place?  I guess if RAM had flushed
that way; or if many deletes accumulated on the middle one.  But I
don't think merging would tend to produce sandwich cases itself (since
it would have merged that middle one).

I like your idea to keep "delete count per segment" in the segments
file.  This information is certainly useful to the merge policy
because it should proportionally reducde a segments size according to
what %tg of its docs are deleted, and, it should favor merging
segments with high # deletes to free up the storage.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to