On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Right I'm calling a newly created segment (ie flushed from RAM) level
0 and then a level 1 segment is created when you merge 10 level 0
segments, level 2 is created when merge 10 level 1 segments, etc.
That is not how the current merge policy works. There are two
orthogonal aspects to this problem:
1 the measurement of a segment size
2 the merge behaviour given a measurement
In the current code:
1 The measurement of a segment size is the document count in the
segment, not the actual RAM or file size. Levels are defined according
to this measurement.
2 The behaviour is the two invariants when mergeFactor (M) does not
change and segment doc count is not reaching maxMergeDocs: B for
maxBufferedDocs, f(n) defined as ceil(log_M(ceil(n/B)))
1) If i (left*) and i+1 (right*) are two consecutive segments of
doc counts x and y, then f(x) >= f(y).
2) The number of committed segments on the same level (f(n)) <= M.
The document counts are approximation of segment sizes thus
approximation of merge cost. Sometimes, however, they do not correctly
reflect segment sizes. So it is probably a good idea to use RAM or
file size as measurement of a segment size as Mike suggested. But the
behaviour does not have to change: the two invariants can still be
guaranteed, with the definition of sizes and levels modified according
to the new measurement.
Ning
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]