Re: incremental document field update

Babak Farhang Thu, 14 Jan 2010 01:28:13 -0800

> Reading that trail, I wish the original poster gave up on his idea (

Err, that should have read..


"Reading that trail, I wish the original poster hadn't given up on his idea"


On Thu, Jan 14, 2010 at 2:23 AM, Babak Farhang <[email protected]> wrote:
> Hi,
>
> I've been thinking about how to update a single field of a document
> without touching its other fields. This is an old problem and I was
> considering a solution along the lines of Andrzej Bialecki's post to
> the dev list back in '07:
>
>
> <quote  http://markmail.org/message/tbkgmnilhvrt6bii >
>
> I have the following scenario: I want to use ParallelReader to
> maintain parts of the index that are changing quickly, and where
> changes are limited to specific fields only.
>
> Let's say I have a "main" index (many fields, slowly changing, large
> updates), and an "aux" index (fast changing, usually single doc and
> single field updates). I'd like to "replace" documents in the "aux"
> index - that is, delete one doc and add another - but in a way that
> doesn't change the internal document numbers, so that I can keep the
> mapping required by ParallelReader intact.
>
> I think this is possible to achieve by using a FilterIndexReader,
> which keeps a map of updated documents, and re-maps old doc ids to the
> new ones on the fly.
>
> From time to time I'd like to optimize the "aux" index to get rid of
> deleted docs. At this time I need to figure out how to preserve the
> old->new mapping during the optimization.
>
> So, here's the question: is this scenario feasible? If so, then in the
> trunk/ version of Lucene, is there any way to figure out (predictably)
> how internal document numbers are reassigned after calling optimize()
> ?
>
> </quote>
>
>
> Reading that trail, I wish the original poster gave up on his idea (
> http://markmail.org/message/tbkgmnilhvrt6bii#query:+page:1+mid:kn77zpiu43kd2ufn+state:results
> )
>
>
> <quote>
> Thanks for the input - for now I gave up on this, after discovering
> that I would have no way to ensure in TermDocs.skipTo() that document
> id-s are monotonically increasing (which seems to be a part of the
> contract).
> </quote>
>
> I imagine if Andrzej's proposed FilterIndexReader maintains 2 sorted
> (ordered) maps, one from internal document-ids to "view" document-ids,
> and another mapping from  "view" document-ids to internal
> document-ids, then things like skipTo() can be implemented reasonably
> efficiently. Only the mapped ids are maintained in these structures.
> (Also note that a mapped "view" document-id represents an internally
> deleted document with that id.)
>
> And if we can find a way to merge the segments of this "aux" index
> along whenever the segments of its associated "main" index are merged
> or optimized (such that the [internal] doc-ids in the merged aux index
> end up getting sync'ed with those of the trunk), then there shouldn't
> be all that many doc-ids to map anyway (if we merge frequently
> enough).
>
> So to go back Andrzej's question: is there any way to figure out
> (predictably) how internal document numbers [in the main index] are
> reassigned after calling optimize() ? How does LUCENE-847, as Doug
> Cutting suggests in that trail, help?
>
> Sorry if that was long winded, had to start somewhere ;)
>
> -Babak
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: incremental document field update

Reply via email to