Nikolay, >> Should we consider it's removal in Ignite 3 Don't think so.
My initial ReadRepair implementation uses version to detect inconsistency. Strategy can be changed later (most likely it will) or even provided ability to use own strategy. Data streamer's and cache.load's cases can be supported by simple values check as the second phase of checks. As I mentioned privately - all limitations are limitations for the initial version and can be eliminated in the future. Currently, the main goal is to merge the solution to continue the work. * After the merge, it will be possible to split work on this task (metrics, whole partitions fix, data streamers, ... etc). * Seems, I found the consistency issue checking my feature (a backup can be outdated for some time after tx finish in full_sync). Merge will provide the ability to investigate bug deeper and solve if confirmed. On Thu, Jun 20, 2019 at 2:22 PM Nikolay Izhikov <nizhi...@apache.org> wrote: > Anton. > > I worried about this limitation: > > > Entries streamed using data streamer (using not a "cache.put" based > updater) and loaded by cache.load. > > As we discussed privately in this modes > > *ALL ENTRIES ON ALL OWNERS WILL HAVE DIFFERENT VERSIONS* > > Why we need this modes, in the first place? > Should we consider it's removal in Ignite 3 or should we fix them? > > В Чт, 20/06/2019 в 14:15 +0300, Anton Vinogradov пишет: > > Igniters, > > I'm glad to introduce Read Repair feature [0] provides additional > > consistency guarantee for Ignite. > > > > 1) Why we need it? > > The detailed explanation can be found at IEP-31 [1]. > > In short, because of bugs, it's possible to gain an inconsistent state. > > We need additional features to handle this case. > > > > Currently we able to check cluster using Idle_verify [2] feature, but it > > will not fix the data, will not even tell which entries are broken. > > Read Repair is a feature to understand which entries are broken and to > fix > > them. > > > > 1) How it works? > > IgniteCache now able to provide special proxy [3] withReadRepair(). > > This proxy guarantee that data will be gained from all owners and > compared. > > In the case of consistency violation situation, data will be recovered > and > > a special event recorded. > > > > 3) Naming? > > Feature name based on Cassandra's Read Repair feature [4], which is > pretty > > similar. > > > > 4) Limitations which can be fixed in the future? > > * MVCC and Near caches are not supported. > > * Atomic caches can be checked (false positive case is possible on this > > check), but can't be recovered. > > * Partial entry removal can't be recovered. > > * Entries streamed using data streamer (using not a "cache.put" based > > updater) and loaded by cache.load > > are perceived as inconsistent since they may have different versions > for > > same keys. > > * Only explicit get operations are supported (getAndReplace, getAndPut, > > etc can be supported in future). > > > > 5) What's left? > > * SQL/ThinClient/etc support. > > * Metrics (found/repaired). > > * Simple per-partition recovery feature able to work in the background > in > > addition to per-entry recovery feature. > > > > 6) Is code checked? > > * Pull Request #5656 [5] (feature) - has green TC. > > * Pull Request #6575 [6] (RunAll with the feature enabled for every > get() > > request) - has a limited amount of failures (because of data streamer, > > cache.load, etc). > > > > Thoughts? > > > > [0] https://issues.apache.org/jira/browse/IGNITE-10663 > > [1] > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-31+Consistency+check+and+fix > > [2] > > > https://apacheignite-tools.readme.io/docs/control-script#section-verification-of-partition-checksums > > [3] > > > https://github.com/apache/ignite/blob/27b6105ecc175b61e0aef59887830588dfc388ef/modules/core/src/main/java/org/apache/ignite/IgniteCache.java#L140 > > [4] > > > https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsRepairNodesReadRepair.html > > [5] https://github.com/apache/ignite/pull/5656 > > [6] https://github.com/apache/ignite/pull/6575 >