Igniters, I have recently discovered [1] that Ignite can arrive in a state when an optimistic serializable transaction can never be successfully committed from a backup node [2].
In short, the root cause of this issue is that there are configurations that allow a key to be stored on primary and backup nodes with different versions. This is a fundamental design choice that made a while ago, however, I am not sure if this is a right way to go. When primary and backup versions differ and read load balancing is enabled, the read version will always mismatch with primary version and optimistic serializable transaction will always fail. Here I wanted to discuss both short-term mitigation plan for the issue [2] as well as a longer-term changes to replication protocol. As a short-term solution for [2] I suggest to force reads from a primary node inside optimistic serializable transactions. The question is whether to enforce this behavior only if the cache has a 3-rd party persistence storage or this behavior should be always enforced. Note that the version mismatch may appear even without a 3-rd party persistence storage when an expiry policy is used. However, in this case, the version mismatch is time-bound to the TTL cleanup lag. Personally, I would go with always enforcing primary-node reads inside an optimistic serializable transaction. As a long-term solution which would eliminate the possibility of versions desync on primary and backup nodes, I would suggest to revisit the read-through and TTL expiry semantics. It looks like quite a lot of users are actually struggling with the current implementation of read-through because a miss does not load the value to all partition nodes [3]. As for TTL, I remember it clearing up entries locally was a big issue for a proper MVCC rebalance implementation (we ended up prohibiting TTL for MVCC caches). I think it may be better to make read-through and entry expiry a partition-wide operation with the underlying cache guarantees. For read-through it is justified because a partition-wide operation penalty is comparable with the cache store load anyway (otherwise, a 3rd party storage makes little sense). For entries expiration it should not make any difference because it happens in background anyways. Any thoughts on the subject are very much appreciated. --AG [1] http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-NodeOrder-in-GridCacheVersion-td46108.html [2] https://issues.apache.org/jira/browse/IGNITE-12739 [3] http://apache-ignite-developers.2346864.n4.nabble.com/Re-Read-through-not-working-as-expected-in-case-of-Replicated-cache-td46083.html
