[
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721420#comment-16721420
]
Pavel Kovalenko commented on IGNITE-4210:
-----------------------------------------
[~Alexey Kuznetsov]
I've reviewed your solution and have several concerns about it's implementation
and test.
1) Current implentation of cache loading mechanism doesn't fit into requirement
that there should be no ongoing update operations during PME.
2) We're waiting for finishing that operations at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture#waitPartitionRelease
method . Cache load operations are not covered in partitions release future
and have their custom workflow which is big miss now and this should be fixed.
3) GridDhtInvalidPartitionException is throwed only when affinity for that
partition is changed. What will be happened when some partitions are moved to
other nodes, some not? In that case some of the cache loads will be finished
with exception, some of them not. This will cause of data inconsistency between
nodes after PME is finished.
4) For me test doesn't cover well case when cluster topology is changed. Future
that starts new nodes can be completed before you run load to cache store.
Moreover test don't check that event with ClusterTopologyCheckedException is
fired always when topology is changed. It means that sometimes test checks
negative (expected) scenario, sometimes positive scneario which leads to test
flakiness.
>From my side of view correct solution should be implemented as below:
1) Introduce new cache future in
org.apache.ignite.internal.processors.cache.GridCacheMvccManager (as welll as
locks, transactions, atomic updates futures) that will indicate that cache
loading is in progress.
2) Create and register this future at the beginning of
org.apache.ignite.internal.processors.cache.store.CacheStoreManager#loadAll
method.
3) Future should have ability to cancel. Add this future as a part of
partitionsRelease future and cancel it when waitPartitionsRelease event is
happened.
4) Divide whole keys set to micro-batches in loadAll. At the end of each of
micro-batch, check that CacheLoadFuture is not cancelled by
waitPartitionsRelease.
5) If future was cancelled, immediately finish this future to unblock
waitPartitionsRelease and throw appropriate exception to user, that topology is
changed and operation should be retired.
6) Fix test in a way to have 100% guarantees that it checks negative scenario
with cluster topology changing.
> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose
> data.
> --------------------------------------------------------------------------------
>
> Key: IGNITE-4210
> URL: https://issues.apache.org/jira/browse/IGNITE-4210
> Project: Ignite
> Issue Type: Bug
> Reporter: Anton Vinogradov
> Assignee: Alexey Kuznetsov
> Priority: Major
> Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
> sometimes have failures.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)