Alexey,

>> In short, the root cause of this issue is that there are configurations
>> that allow a key to be stored on primary and backup nodes with different
>> versions.
Faced with the same problem during ReadRepair development.

>> I suggest to force reads from a primary
>> node inside optimistic serializable transactions.
It looks like a proper fix (read-from-backup= ... && !read-through).

>>  I would suggest to revisit the
>> read-through and TTL expiry semantics.
Do we really need these features?
- we have great full-featured consistent persistence, what's the point to
use limited-featured inconsistent persistence via the external database?
Can we get rid of this feature at 3.0?
- Expiry policy is expensive (slowdown the cluster) and does not guarantee
the in-time removal, and always may be replaced by proper design (state
machine, query, eviction, in-memory cluster restart, etc).

On Thu, Mar 5, 2020 at 12:33 PM Alexey Goncharuk <alexey.goncha...@gmail.com>
wrote:

> Igniters,
>
> I have recently discovered [1] that Ignite can arrive in a state when an
> optimistic serializable transaction can never be successfully committed
> from a backup node [2].
>
> In short, the root cause of this issue is that there are configurations
> that allow a key to be stored on primary and backup nodes with different
> versions. This is a fundamental design choice that made a while ago,
> however, I am not sure if this is a right way to go. When primary and
> backup versions differ and read load balancing is enabled, the read version
> will always mismatch with primary version and optimistic serializable
> transaction will always fail.
>
> Here I wanted to discuss both short-term mitigation plan for the issue [2]
> as well as a longer-term changes to replication protocol.
>
> As a short-term solution for [2] I suggest to force reads from a primary
> node inside optimistic serializable transactions. The question is whether
> to enforce this behavior only if the cache has a 3-rd party persistence
> storage or this behavior should be always enforced. Note that the version
> mismatch may appear even without a 3-rd party persistence storage when an
> expiry policy is used. However, in this case, the version mismatch is
> time-bound to the TTL cleanup lag. Personally, I would go with always
> enforcing primary-node reads inside an optimistic serializable transaction.
>
> As a long-term solution which would eliminate the possibility of versions
> desync on primary and backup nodes, I would suggest to revisit the
> read-through and TTL expiry semantics. It looks like quite a lot of users
> are actually struggling with the current implementation of read-through
> because a miss does not load the value to all partition nodes [3]. As for
> TTL, I remember it clearing up entries locally was a big issue for a proper
> MVCC rebalance implementation (we ended up prohibiting TTL for MVCC
> caches).
> I think it may be better to make read-through and entry expiry a
> partition-wide operation with the underlying cache guarantees. For
> read-through it is justified because a partition-wide operation penalty is
> comparable with the cache store load anyway (otherwise, a 3rd party storage
> makes little sense). For entries expiration it should not make any
> difference because it happens in background anyways.
>
> Any thoughts on the subject are very much appreciated.
>
> --AG
>
> [1]
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-NodeOrder-in-GridCacheVersion-td46108.html
> [2] https://issues.apache.org/jira/browse/IGNITE-12739
> [3]
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Re-Read-through-not-working-as-expected-in-case-of-Replicated-cache-td46083.html
>

Reply via email to