No, in my example optimize() was never called. The merge rule operates recursively. So, after 99 documents had been added the segment stack contained nine indexes with ten documents and nine with one document. When the hundredth document was added, the nine one document segments were popped of the stack and merged into a single segment that was pushed onto the stack. So then the top of the stack had ten segments each containing ten documents, i.e., mergeFactor segments of the same size, and these ten segments were then merged into a single segment of 100 documents. So adding the 100th document triggered two merges.

(One error in my previous example: the 100 document segment actually contains documents 0-99, not 0-100.)

A corollary of this is, when mergeFactor is 10 and no deletions have been made, the segments correspond to the digits in the decimal representation of the number of documents in the index. So, an 85 document index has eight segments with 10 documents and five segments with one documents. (This is somewhat of a simplification, as Lucene automatically merges single document segments before ever writing them to disk as an optimization.)

It is most beneficial to call IndexWriter.optimize() only when you know you won't be adding documents to an index for a while, but will be searching it a lot. Calling optimize() periodically while indexing mostly just slows things down.

Doug

Otis Gospodnetic wrote:
I see, so every mergeFactor documents they are compined into a single
new segment in the index, and only when optimize() is called do those
multiple segments get merged into a single segment.
In your example below that would mean that optimize() was called after
document 100 was added, hence a single segment with documents 0-100.
Is this right?

Thanks,
Otis

--- Doug Cutting <[EMAIL PROTECTED]> wrote:

Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment
stack. Whenever there are mergeFactor segments on the top of the stack that
are the same size, these are merged together into a new single segment
that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows:
document 121
document 120
documents 110-119
documents 100-109
documents 0-100
The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document
129 with a new single segment.

It's actually a little more complicated than that, since (among other

reasons) after docuuments are deleted a segment's size will no longer
be exactly a power of the mergeFactor.

Doug

Otis Gospodnetic wrote:

This is via mergeFactor?

--- Doug Cutting <[EMAIL PROTECTED]> wrote:


The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen

as

more documents are added to the index, without optimization.

Scott Ganyo wrote:


It just marks the record as deleted.  The record isn't actually
removed

until the index is optimized.

Scott

Rob Outar wrote:



Hello all,

  I used the delete(Term) method, then I looked at the index

files,

only one
file changed "_1tx.del"  I found references to the file still in

some

of the
index files, so my question is how does Lucene handle deletes?

Thanks,

Rob


--
To unsubscribe, e-mail:
For additional commands, e-mail:


--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus � Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>

For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus � Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to