I've built a search prototype feature for my application using Lucene, and it 
works great.  The application monitors a remote system and currently indexes 
just a few core attributes of the objects on that system.  I get notifications 
when objects change, and I then update the Lucene index to keep things in sync. 
  The thing is that even when objects on the remote system are updated, it's 
relatively unlikely that the specific attributes I'm indexing (like name) were 
changed.  From what I can see, IndexWriter.updateDocument() makes no effort to 
determine if the existing document is actually dirty compared to the provided 
one.  My questions are:

Is this true that documents are assumed to be changed and not actually checked 
before replacement?

Has such a feature been considered?

Is it worth it to query for the document, manually dirty check it and then 
delete/re-add only if it's different if changes to the indexed fields are 
relatively uncommon?  My concern is that I'm inadvertently causing a lot of 
segment churn for things that aren't actually changing.

Thanks in advance,
Tommy


Reply via email to