Igniters, I'm glad to introduce Read Repair feature [0] provides additional consistency guarantee for Ignite.
1) Why we need it? The detailed explanation can be found at IEP-31 [1]. In short, because of bugs, it's possible to gain an inconsistent state. We need additional features to handle this case. Currently we able to check cluster using Idle_verify [2] feature, but it will not fix the data, will not even tell which entries are broken. Read Repair is a feature to understand which entries are broken and to fix them. 1) How it works? IgniteCache now able to provide special proxy [3] withReadRepair(). This proxy guarantee that data will be gained from all owners and compared. In the case of consistency violation situation, data will be recovered and a special event recorded. 3) Naming? Feature name based on Cassandra's Read Repair feature [4], which is pretty similar. 4) Limitations which can be fixed in the future? * MVCC and Near caches are not supported. * Atomic caches can be checked (false positive case is possible on this check), but can't be recovered. * Partial entry removal can't be recovered. * Entries streamed using data streamer (using not a "cache.put" based updater) and loaded by cache.load are perceived as inconsistent since they may have different versions for same keys. * Only explicit get operations are supported (getAndReplace, getAndPut, etc can be supported in future). 5) What's left? * SQL/ThinClient/etc support. * Metrics (found/repaired). * Simple per-partition recovery feature able to work in the background in addition to per-entry recovery feature. 6) Is code checked? * Pull Request #5656 [5] (feature) - has green TC. * Pull Request #6575 [6] (RunAll with the feature enabled for every get() request) - has a limited amount of failures (because of data streamer, cache.load, etc). Thoughts? [0] https://issues.apache.org/jira/browse/IGNITE-10663 [1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix [2] https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums [3] https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140 [4] https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html [5] https://github.com/apache/ignite/pull/5656 [6] https://github.com/apache/ignite/pull/6575
