On 9/5/06, Ning Li <[EMAIL PROTECTED]> wrote:
> What about an invariant that says the number of main index segments
> with the same level (f(n)) should be less than M.

That is exactly what the second property says:
"Less than M number of segments whose doc count n satisfies B*(M^c) <=
n < B*(M^(c+1)) for any c >= 0."

In other words, less than M number of segments with the same f(n).

Ah, I had missed that.  But I don't believe that lucene currently
obeys this in all cases.

> I am concerned about corner cases causing tons of segments and slowing
> search or causing errors due to file descriptor exhaustion.
>
> When merging, maybe we should count the number of segments at a
> particular index level f(n), rather than adding up the number of
> documents.  In the presence of deletions, this should lead to faster
> indexing (due to less frequent merges) I think.

Given M, B and an index which has L (0 < L < M) segments with docs
less than B, how many ram docs should be accumulated before a merge is
triggered? B is not good. B-sum(L) is the old strategy which has
problems.

The new IndexWriter changes ad an additional constraint: to delete
documents efficiently, the first merge must be on buffered documents
only to ensure that ids don't change.  We should also explore changing
the index invariants to accommodate this.

Do you have any ideas in this area?  Is a monotonically decreasing
segment level (your f(n)) really required?

So between B-sum(L) and B? Once there are M segments with
docs less than B, they'll be merged. But what if L=0? Should B ram
docs be accumulated before flushed in that case?

It seems like it.  Examples are easier to visualize sometimes... do
you have an example where this wouldn't be advisable?

In any case, if flushing ram docs causes the the number of segments
with <B docs to reach M in close(), a merge with those segments should
be triggered.

Right.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to