This isn't a new problem.  Databases have been around for what, 30+ years?

On Thu, Sep 8, 2011 at 11:01 AM, Simon Willnauer
<simon.willna...@googlemail.com> wrote:
> On Thu, Sep 8, 2011 at 4:21 PM, Jason Rutherglen
> <jason.rutherg...@gmail.com> wrote:
>> The delete by query is solved by recording the primary / UID of the
>> document(s) deleted.  It's only expensive if the transaction log
>> implementation is not designed properly.  :)
>
> phew I don't think this is realistic. I mean this could be a lot of
> documents and looking up a lot of primary keys, plus you need to know
> what the primary key is and you somehow need to do this async. I don't
> consider this as an option.
>
> simon
>>
>> On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer
>> <simon.willna...@googlemail.com> wrote:
>>> hey folks,
>>>
>>> we already have transaction logging on Solr side so I should have
>>> started this discussion earlier. However, I want to bring this up to
>>> the list since I think this is a very valuable feature also for plain
>>> Lucene users and eventually this should also be available to them. I
>>> don't think this needs to be a core feature at all but I think we need
>>> to provide the necessary hooks in Lucene core to make this reliable
>>> and consistent. I have a couple of concerns that which the current
>>> extension mechanism we provide on the IndexWriter side this feature
>>> can only be implemented in a sub-optimal way on the Solr (or basically
>>> on top of lucene) but lemme elaborate this a little.
>>>
>>> IndexWriter doesn't provide any transaction guarantees neither does it
>>> give any guarantees on the order. So if you index two versions of a
>>> document with the same delete key you can't tell which one wins unless
>>> you prevent IW from seeing those two documents at the same time ie.
>>> locking before you hit IW. This is basically what other implementation
>>> do like ElasticSearch which uses locks assigned to buckets in an array
>>> selected based on the del terms hash. However this gets a little more
>>> complex once you get to DeleteQueries where you can't tell which
>>> document is affected so they might be misplaced in the transaction log
>>> if the order doesn't match the order the IW sees. Under the hood IW
>>> does maintain such an order inside the DocumentsWriterDeleteQueue
>>> which could be utilized to provide a total ordering that IMO should be
>>> reflected in the transaction log.
>>>
>>> Before I am going to propose ways of how this could be implemented I
>>> want to check if other think we should provide more reliable ways for
>>> users with the need for durability and consistent recovery.
>>>
>>> simon
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to