Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Ning Li Fri, 23 Mar 2007 08:49:40 -0800

On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote:

Yes the code re-computes the level of a given segment from the current
values of maxBufferedDocs & mergeFactor.  But when these values have
changed (or, segments were flushed by RAM not by maxBufferedDocs) then
the way it computes level no longer results in the logarithmic policy
that it's trying to implement, I think.


The algorithm gradually re-adjusts toward the latest maxBufferedDocs &
mergeFactor - see case 3 of the "Overview of merge policy" comment in
the code.

With the modification that RAM or file size as segment size, the
algorithm would work by maxBufferedSize & mergeFactor. Let's say
maxBufferedDocs or maxBufferedSize is the base size. Lucene-845
complains that the merge behaviour for segments <= base size in some
cases is not logrithmic. It's a tradeoff. We always keep small
segments in check. The algorithm reflects the tradeoff made when
segments <= base size.

Exactly, when logarithmic works "correctly" (you don't change
mergeFactor/maxBufferedDocs and your docs are all uniform in size), it
does achieve this "merge roughly equal size in byte" segments (yes
those two numbers are roughly equal).  Though now I have to go ponder
KS's Fibonacci series approach!.


It doesn't have to be Fibonacci series. Logrithmic would work well
too. The main difference is KS can choose any segments to merge, not
just adjacent segments. Thus it may find better candidates for merge.

Basically, this would keep the same logarithmic approach now, but
derive levels somehow from the net size in bytes.


Exactly! Levels defined in size in bytes.

Cheers,
Ning

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

Reply via email to