The deleted docs are actually stored separately, per segment, into
files named _X_N.del, where X is the segment name and N is a
generation count (keeps increasing by 1 every time new deletes are
committed to that segment).
Normal segment merging will also collapse the deletes in those
segments that were merged, thus collapsing down the docIDs. You can
also call IndexWriter.expungeDeletes() to collapse all holes from the
index. That method just merges adjacent segments that have deletes ...
Lucene used to also have a file "deletable" which tracked those index
files that should be deleted, but that is no longer used as of 2.1.
Instead, Lucene computes (using reference counting) which files in the
directory are no longer referenced.
Mike
Anshum wrote:
Hi John,
In case of deletions, it is just a delayed delete. In other words,
the doc
is just marked as deleted in the deletable file, leaving a void in the
numbering of docs. The actual shifting of document ids happens only
when you
optimize the index. In that case the deletables file is used to
physically
remove the docs from the index.
Hope that clears the doubt :)
--
Anshum Gupta
Naukri Labs!
On Fri, Jul 11, 2008 at 8:24 AM, John Griffin <[EMAIL PROTECTED]
>
wrote:
Guys (and Gals),
A question on index deletions, what exactly happens to the Lucene
document
numbers in an index when a document is deleted? Let's say I have a
5 doc
index.
Document # Doc
0 doc1
1 doc2
2 doc3
3 doc4
4 doc5
If doc 2 is deleted, is this what I'm left with?
Document # Doc
0 doc1
1 doc2
2 doc4
3 doc5
This is my assumption. If not, what DOES happen?
TIA
John G.
--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]