>Of course, but what about the Lucene doc id doesn't provide that?

The question being how you determine the correct doc id to use in the first 
place (especially when they are know to be volatile) - the current answer is to 
use a stable identifier term which your app holds in the index, AKA a primary 
key. 
To support single-doc updates, app developers currently have to :
a) allocate keys uniquely
b) ensure they do not store >1 document with the same key.

My suggestion was, being fundamental requirements to supporting updates Lucene 
could, as a convenience, provide some support for this in it's API - in the 
same way a database typically does.

Earwin has perhaps extended your (and my) original thinking to incorporate 
set-based updates (a single set of values applied to many documents which match 
a query).
His proposal (correct me if I'm wrong, Earwin) is that single and set-based 
changes could both be supported by a single IndexWriter.updateDocuments(query, 
changedFields) type method.
The benefit of this scheme is that we are providing a simple method, re-using 
established concepts (Queries for document selection) but this does not change 
the fact that many users will still need to use primary keys for single-doc 
updates and they have to assume responsibility for a) and b) above.

On reflection, I guess these responsibilities are not too tough.
a) is catered for by the fact that Lucene is not typically the master data 
store (yet!) and filesystem/webserver/database datasources where document 
content is sourced  usually have the responsibility to allocate some form of 
unique identifier in the form of URLs, database keys or filenames which can be 
used. Also, b) is not too hard to handle in app code if you always use the 
IndexWriter.updateDocument(term,doc) method for inserts.


Cheers,
Mark




________________________________
From: Grant Ingersoll <gsing...@apache.org>
To: java-dev@lucene.apache.org
Sent: Mon, 29 March, 2010 13:11:56
Subject: Re: Incremental Field Updates



On Mar 29, 2010, at 2:26 AM, Mark Harwood wrote:


>
>
>>
>>Of course introducing the idea of updates also introduces the notion of a 
>>primary key and there's probably an entirely separate discussion to be had 
>>around user-supplied vs Lucene-generated keys.
>>
>>
>>Not sure I see that need.  Can you explain your reasoning a bit more?
>
>>>
>
>
>If you want to update a document you need a way of expressing *which* document 
>you are updating.

Of course, but what about the Lucene doc id doesn't provide that?


      

Reply via email to