Hi Thomas, On Wed, Feb 6, 2013 at 2:50 PM, Becker, Thomas <thomas.bec...@netapp.com> wrote: > I've built a search prototype feature for my application using Lucene, and it > works great. The application monitors a remote system and currently indexes > just a few core attributes of the objects on that system. I get > notifications when objects change, and I then update the Lucene index to keep > things in sync. The thing is that even when objects on the remote system > are updated, it's relatively unlikely that the specific attributes I'm > indexing (like name) were changed. From what I can see, > IndexWriter.updateDocument() makes no effort to determine if the existing > document is actually dirty compared to the provided one. My questions are: > > Is this true that documents are assumed to be changed and not actually > checked before replacement?
Yes, it's true. > Has such a feature been considered? I'm not sure but I see several issues: For example if you reindex the exact same document with a different analyzer, the index terms/positions/offsets/payloads might be different. Moreover, one can only perform such a comparison if the document is stored, which is something that Lucene doesn't enforce. > Is it worth it to query for the document, manually dirty check it and then > delete/re-add only if it's different if changes to the indexed fields are > relatively uncommon? My concern is that I'm inadvertently causing a lot of > segment churn for things that aren't actually changing. You could try to do it, but maybe it is just fine the way it is: as segments get merged deleted docs eventually get expunged. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org