Thanks for the response Adrien.  I guess I'll just leave things as they are for 
now.  To be clear though, do merged segments get cleaned up completely even if 
the IndexWriter is never closed?  Currently I'm using NRT search with a single 
writer that stays open for the lifetime of the application.   This product will 
be shipped to customers, so I need the index to be entirely self-managing.

-Tommy

-----Original Message-----
From: Adrien Grand [mailto:jpou...@gmail.com] 
Sent: Wednesday, February 06, 2013 11:14 AM
To: java-user@lucene.apache.org
Subject: Re: updateDocument question

Hi Thomas,

On Wed, Feb 6, 2013 at 2:50 PM, Becker, Thomas <thomas.bec...@netapp.com> wrote:
> I've built a search prototype feature for my application using Lucene, and it 
> works great.  The application monitors a remote system and currently indexes 
> just a few core attributes of the objects on that system.  I get 
> notifications when objects change, and I then update the Lucene index to keep 
> things in sync.   The thing is that even when objects on the remote system 
> are updated, it's relatively unlikely that the specific attributes I'm 
> indexing (like name) were changed.  From what I can see, 
> IndexWriter.updateDocument() makes no effort to determine if the existing 
> document is actually dirty compared to the provided one.  My questions are:
>
> Is this true that documents are assumed to be changed and not actually 
> checked before replacement?

Yes, it's true.

> Has such a feature been considered?

I'm not sure but I see several issues: For example if you reindex the exact 
same document with a different analyzer, the index 
terms/positions/offsets/payloads might be different. Moreover, one can only 
perform such a comparison if the document is stored, which is something that 
Lucene doesn't enforce.

> Is it worth it to query for the document, manually dirty check it and then 
> delete/re-add only if it's different if changes to the indexed fields are 
> relatively uncommon?  My concern is that I'm inadvertently causing a lot of 
> segment churn for things that aren't actually changing.

You could try to do it, but maybe it is just fine the way it is: as segments 
get merged deleted docs eventually get expunged.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to