Just brainstorming a little...
Assuming B=1000, M=10 (I think better with concrete examples)

It seems like we should avoid unnecessary merging, allowing up to 9
segments of 1000 documents or less w/o merging.  When we reach 10
segments, they should be merged into a single segment.  Let's assume a
segment of size 8500 is created by the merge.

Assume we write another 10 full segments that are merged into a bigger
segment of size 10,000.

It *feels* like:
1) we should be able to write full segments of 1000 docs, or less
than that if closing the writer.
2) we should be able to write a full segment of 1000 docs *after* a
non-full segment w/o having to merge
3) 10,000 and 8,500 should be at the same index level, not different levels
4) 1000 and 999 docs should be at the same index level

So, I *think* most of our hypothetical problems go away with a simple
adjustment to f(n):

f(n) = floor(log_M((n-1)/B))

Right?

That allows us to write all buffered docs separately (necessary for
easy deletions),
allows us to only merge M segments at a time (decreases number of
merges), and allows us to maintain a monotonically decreasing f(n).

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to