Yup, I also meant 'eventually consistent' when saying such inconsistencies should be acceptable. At some point in time after transactions have been committed and topology changes have been handled (state transfer completed) and we have a steady state we should see a consistent index when querying.

On 07/31/2017 11:41 AM, Gustavo Fernandes wrote:
IMO, indexing should be eventually consistent, as this offers the best performance.

On tx-caches, although Lucene has hooks to be enlisted in a transaction [1], some backends (elasticsearch) don't expose this, and Hibernate Search by design doesn't make use of it. So currently we must deal with inconsistencies
after the fact: checking for nulls, mismatched types and so on.

[1] https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/index/TwoPhaseCommit.html


On Fri, Jul 28, 2017 at 1:59 PM, Adrian Nistor <anis...@redhat.com <mailto:anis...@redhat.com>> wrote:

    My feeling regarding this was to accept such inconsistencies, but
    maybe
    I'm wrong. I've always regarded indexing as being async in
    general, even
    though it did behave as if being sync in some not so rare
    circumstances,
    which probably made people believe it is expected to be sync in
    general.
    I'm curious what Sanne and Gustavo have in mind.

    Please note that updating the index synchronously during tx commit was
    always regarded as a performance bottleneck, so it was out of the
    question.

    And that would not always work anyway, it all depends on the
    underlying indexing technology. For example when using HS with elastic
    search you have to accept that elastic indexing is always async.

    And there might not be an index at all. It's very possible that the
    query runs unindexed. In that case it will use distributed streams
    which
    have their own transaction issues.

    In the past we had some bugs were a matching entry was deleted/evicted
    right before the search results were returned to the user, so
    loading of
    those values failed in a silent way. Those queries mistakenly returned
    some unexpected nulls among other valid results. The fix was to just
    filter out those nulls. We could enhance that to double check that the
    returned entry is indeed of the requested type, to also cover the
    issue
    that you encountered.

    Adrian

    On 07/28/2017 01:38 PM, Radim Vansa wrote:
    > Hi,
    >
    > while working on ISPN-7806 I am wondering how should queries
    work with
    > transactions. Right now it seems that updates to index are done
    during
    > either regular command execution (on originator [A]) or prepare
    command
    > on remote nodes [B]. Both of these cause rolled-back
    transactions to be
    > seen, so these must be treated as bugs [C].
    >
    > If we index the data after committing the transaction, there
    would be a
    > time window when we could see the updated entries but the index
    would
    > not reflect that. That might be acceptable limitation if a
    > query-matching misses some entity, but it's also possible that we
    > retrieve the query result key-set and then (after retrieving full
    > entities) we return something that does not match the query. One
    of the
    > reproducers for ISPN-7806 I've written [1] triggers a situation
    where
    > listing all Persons could return Animal (different entity type),
    so I
    > think that there's no validity post-check (though these reproducers
    > don't use transactions).
    >
    > Therefore, I wonder if the index should contain only the key;
    maybe we
    > should store an unique version and invalidate the query if some
    of the
    > entries has changed.
    >
    > If we index the data before committing the transaction, similar
    > situation could happen: the index will return keys for entities that
    > will match in the future but the actually returned list will contain
    > stale entities.
    >
    > What's the overall plan? Do we just accept inconsistencies? In that
    > case, please add a verbose statement in docs and point me to that.
    >
    > And if I've misinterpreted something and raised the red flag in
    error,
    > please let me know.
    >
    > Radim
    >
    > [A] This seems to be a regression after moving towards async
    > interceptors - our impl of
    > org.hibernate.search.backend.TransactionContext is incorrectly
    bound to
    > TransactionManager. Then we seem to be running out of
    transaction and
    > are happy to index it right away. The thread that executes the
    > interceptor handler is also dependent on ownership (due to remote
    > LockCommand execution), so I think that it does not fail the
    local-mode
    > tests.
    >
    > [B] ... and it does so twice as a regression after ISPN-7840 but
    that's
    > easy to fix.
    >
    > [C] Indexing in prepare command was OK before ISPN-7840 with
    pessimistic
    > locking which does not send the CommitCommand, but now that the
    QI has
    > been moved below EWI it means that we're indexing before storing the
    > actual values. Optimistic locking was not correct, though.
    >
    > [1]
    >
    
https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546
    
<https://github.com/rvansa/infinispan/commit/1d62c9b84888c7ac21a9811213b5657aa44ff546>
    >
    >

    _______________________________________________
    infinispan-dev mailing list
    infinispan-dev@lists.jboss.org <mailto:infinispan-dev@lists.jboss.org>
    https://lists.jboss.org/mailman/listinfo/infinispan-dev
    <https://lists.jboss.org/mailman/listinfo/infinispan-dev>




_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to