Hi Thomas,

On Wed, Feb 6, 2013 at 2:50 PM, Becker, Thomas <thomas.bec...@netapp.com> wrote:
> I've built a search prototype feature for my application using Lucene, and it 
> works great.  The application monitors a remote system and currently indexes 
> just a few core attributes of the objects on that system.  I get 
> notifications when objects change, and I then update the Lucene index to keep 
> things in sync.   The thing is that even when objects on the remote system 
> are updated, it's relatively unlikely that the specific attributes I'm 
> indexing (like name) were changed.  From what I can see, 
> IndexWriter.updateDocument() makes no effort to determine if the existing 
> document is actually dirty compared to the provided one.  My questions are:
>
> Is this true that documents are assumed to be changed and not actually 
> checked before replacement?

Yes, it's true.

> Has such a feature been considered?

I'm not sure but I see several issues: For example if you reindex the
exact same document with a different analyzer, the index
terms/positions/offsets/payloads might be different. Moreover, one can
only perform such a comparison if the document is stored, which is
something that Lucene doesn't enforce.

> Is it worth it to query for the document, manually dirty check it and then 
> delete/re-add only if it's different if changes to the indexed fields are 
> relatively uncommon?  My concern is that I'm inadvertently causing a lot of 
> segment churn for things that aren't actually changing.

You could try to do it, but maybe it is just fine the way it is: as
segments get merged deleted docs eventually get expunged.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to