+1 for having a contrib/transactionlog that apps could use, outside of
Solr/ElasticSearch.

And it sounds like one cannot build such a thing unless one forces an
order above Lucene (like ElasticSearch), or, we make it possible to
see/control the order of ops inside IW?

Even ES's approach is limited, since it only works because it only
deletes-by-ID?  And not by random Term or Query, etc.  This way ES
"only" must ensure the order when the same ID is being updated; the
order across different IDs is unimportant.

Returning a long seqID seems the least invasive change to make this
total ordering possible?  Especially since the DWDQ already computes
this order...

This would presumably mean, as long as ES cutover ordering the entries
in the transaction log according to the returned seqID, that it could
then remove the array of locks, and freely allow even docs w/ the same
ID to be updated "at once" and IW picks which one wins?

I hope this will not somehow mean that apps (nor IW) will need/want to
suddenly save arrays mapping docID (or appID) to seqID....

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer
<simon.willna...@googlemail.com> wrote:
> hey folks,
>
> we already have transaction logging on Solr side so I should have
> started this discussion earlier. However, I want to bring this up to
> the list since I think this is a very valuable feature also for plain
> Lucene users and eventually this should also be available to them. I
> don't think this needs to be a core feature at all but I think we need
> to provide the necessary hooks in Lucene core to make this reliable
> and consistent. I have a couple of concerns that which the current
> extension mechanism we provide on the IndexWriter side this feature
> can only be implemented in a sub-optimal way on the Solr (or basically
> on top of lucene) but lemme elaborate this a little.
>
> IndexWriter doesn't provide any transaction guarantees neither does it
> give any guarantees on the order. So if you index two versions of a
> document with the same delete key you can't tell which one wins unless
> you prevent IW from seeing those two documents at the same time ie.
> locking before you hit IW. This is basically what other implementation
> do like ElasticSearch which uses locks assigned to buckets in an array
> selected based on the del terms hash. However this gets a little more
> complex once you get to DeleteQueries where you can't tell which
> document is affected so they might be misplaced in the transaction log
> if the order doesn't match the order the IW sees. Under the hood IW
> does maintain such an order inside the DocumentsWriterDeleteQueue
> which could be utilized to provide a total ordering that IMO should be
> reflected in the transaction log.
>
> Before I am going to propose ways of how this could be implemented I
> want to check if other think we should provide more reliable ways for
> users with the need for durability and consistent recovery.
>
> simon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to