On 16/12/2011 20:54, Paul Taylor wrote:
On 16/12/2011 17:43, Uwe Schindler wrote:
Hi,
I'm adding documents to an index, at a later date I modify a
document and
update the index, close the writer and open a new IndexReader. My
indexreader iterates over terms for that field and docFreq() returns
one
as I
would expect, however the iterator returns both the old value of the
document
and the new value, I don't expect (or want) the old value to still
be in
the index,
so why is this.
That is all as expected. Updating documents in a Lucene index is an
atomic
delete/add operation. Deleting in Lucene just marks the document for
deletion, but it is still there (search results won't return it). The
condequence is that all terms are still in terms index and all document
frequencies still contain both documents. This *may* cause scoring
problems
in indexes with many deletes (but those will go away as merging will
remove
them, see below), but this is known (see wiki, javadocs,...).
Once you add more documents the index will merge segments and that
will make
the deleted documents disappear. If you really want to do remove the old
documents with all terms (this is veeeeery expensive), you can call
IW.forceMergeDeletes:
http://lucene.apache.org/java/3_5_0/api/core/org/apache/lucene/index/IndexWr
iter.html#forceMergeDeletes()
The way how inverted indexes work makes it impossible to update the
terms
index afterwards.
Uwe
Hi
Thanks I think you might have it, but tell me if forceMergeDelete() is
a bad idea is there a query I can run that just returns all docs
rather than the iteration I use, (what I want is the value of a
particular field in each doc)
Paul
Never mind Ive got it working by adding another field to the index with
always the same value that I can search on
thansk Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org