(One error in my previous example: the 100 document segment actually contains documents 0-99, not 0-100.)
A corollary of this is, when mergeFactor is 10 and no deletions have been made, the segments correspond to the digits in the decimal representation of the number of documents in the index. So, an 85 document index has eight segments with 10 documents and five segments with one documents. (This is somewhat of a simplification, as Lucene automatically merges single document segments before ever writing them to disk as an optimization.)
It is most beneficial to call IndexWriter.optimize() only when you know you won't be adding documents to an index for a while, but will be searching it a lot. Calling optimize() periodically while indexing mostly just slows things down.
Doug
Otis Gospodnetic wrote:
I see, so every mergeFactor documents they are compined into a single new segment in the index, and only when optimize() is called do those multiple segments get merged into a single segment. In your example below that would mean that optimize() was called after document 100 was added, hence a single segment with documents 0-100. Is this right?Thanks, Otis --- Doug Cutting <[EMAIL PROTECTED]> wrote:Merging happens constantly as documents are added. Each document is initially added in its own segment, and pushed onto the segment
stack. Whenever there are mergeFactor segments on the top of the stack that
are the same size, these are merged together into a new single segment
that replaces them. So, if mergeFactor is 10, and you've added 122 documents, the stack will have five segments, as follows:
document 121
document 120
documents 110-119
documents 100-109
documents 0-100
The next merge will happen after document 129 is added, when a new segment will replace the segments for document 120 through document
129 with a new single segment.
It's actually a little more complicated than that, since (among other
reasons) after docuuments are deleted a segment's size will no longer
be exactly a power of the mergeFactor.
Doug
Otis Gospodnetic wrote:
This is via mergeFactor? --- Doug Cutting <[EMAIL PROTECTED]> wrote:The data is actually removed the next time its segment is merged. Optimizing forces it to happen, but it will also eventually happen
as__________________________________________________more documents are added to the index, without optimization. Scott Ganyo wrote:removedIt just marks the record as deleted. The record isn't actually
files,until the index is optimized. Scott Rob Outar wrote:Hello all, I used the delete(Term) method, then I looked at the index
someonly one file changed "_1tx.del" I found references to the file still in
--of the
index files, so my question is how does Lucene handle deletes?
Thanks,
Rob
--
To unsubscribe, e-mail:
For additional commands, e-mail:
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
Do you Yahoo!?
Yahoo! Mail Plus � Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
--
To unsubscribe, e-mail:<mailto:[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]>For additional commands, e-mail:
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>
__________________________________________________ Do you Yahoo!? Yahoo! Mail Plus � Powerful. Affordable. Sign up now. http://mailplus.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
