On Mar 29, 2010, at 10:11 AM, mark harwood wrote:

> >Of course, but what about the Lucene doc id doesn't provide that?
> 
> The question being how you determine the correct doc id to use in the first 
> place (especially when they are know to be volatile) - the current answer is 
> to use a stable identifier term which your app holds in the index, AKA a 
> primary key. 
> To support single-doc updates, app developers currently have to :
> a) allocate keys uniquely
> b) ensure they do not store >1 document with the same key.
> 
> My suggestion was, being fundamental requirements to supporting updates 
> Lucene could, as a convenience, provide some support for this in it's API - 
> in the same way a database typically does.

I don't think Lucene needs a primary key.  I don't see why this number can't be 
determined in the usual ways.

> 
> Earwin has perhaps extended your (and my) original thinking to incorporate 
> set-based updates (a single set of values applied to many documents which 
> match a query).
> His proposal (correct me if I'm wrong, Earwin) is that single and set-based 
> changes could both be supported by a single 
> IndexWriter.updateDocuments(query, changedFields) type method.
> The benefit of this scheme is that we are providing a simple method, re-using 
> established concepts (Queries for document selection) but this does not 
> change the fact that many users will still need to use primary keys for 
> single-doc updates and they have to assume responsibility for a) and b) above.

Hmmm, this sounds like the Parallel Incr. Indexing Busch has put up in a patch.

> 
> On reflection, I guess these responsibilities are not too tough.
> a) is catered for by the fact that Lucene is not typically the master data 
> store (yet!) and filesystem/webserver/database datasources where document 
> content is sourced  usually have the responsibility to allocate some form of 
> unique identifier in the form of URLs, database keys or filenames which can 
> be used. Also, b) is not too hard to handle in app code if you always use the 
> IndexWriter.updateDocument(term,doc) method for inserts.
> 
> 
> Cheers,
> Mark
> 
> From: Grant Ingersoll <gsing...@apache.org>
> To: java-dev@lucene.apache.org
> Sent: Mon, 29 March, 2010 13:11:56
> Subject: Re: Incremental Field Updates
> 
> 
> On Mar 29, 2010, at 2:26 AM, Mark Harwood wrote:
> 
>> 
>>> 
>>>> Of course introducing the idea of updates also introduces the notion of a 
>>>> primary key and there's probably an entirely separate discussion to be had 
>>>> around user-supplied vs Lucene-generated keys.
>>> 
>>> Not sure I see that need.  Can you explain your reasoning a bit more?
>>>> 
>> 
>> If you want to update a document you need a way of expressing *which* 
>> document you are updating.
> 
> Of course, but what about the Lucene doc id doesn't provide that?
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to