Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Ning Li Fri, 23 Mar 2007 08:21:42 -0800

On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote:

Right I'm calling a newly created segment (ie flushed from RAM) level
0 and then a level 1 segment is created when you merge 10 level 0
segments, level 2 is created when merge 10 level 1 segments, etc.


That is not how the current merge policy works. There are two
orthogonal aspects to this problem:
 1 the measurement of a segment size
 2 the merge behaviour given a measurement

In the current code:
 1 The measurement of a segment size is the document count in the
segment, not the actual RAM or file size. Levels are defined according
to this measurement.
 2 The behaviour is the two invariants when mergeFactor (M) does not
change and segment doc count is not reaching maxMergeDocs: B for
maxBufferedDocs, f(n) defined as ceil(log_M(ceil(n/B)))
    1) If i (left*) and i+1 (right*) are two consecutive segments of
doc counts x and y, then f(x) >= f(y).
    2) The number of committed segments on the same level (f(n)) <= M.

The document counts are approximation of segment sizes thus
approximation of merge cost. Sometimes, however, they do not correctly
reflect segment sizes. So it is probably a good idea to use RAM or
file size as measurement of a segment size as Mike suggested. But the
behaviour does not have to change: the two invariants can still be
guaranteed, with the definition of sizes and levels modified according
to the new measurement.

Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Reply via email to