[ 
https://issues.apache.org/jira/browse/IGNITE-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721420#comment-16721420
 ] 

Pavel Kovalenko commented on IGNITE-4210:
-----------------------------------------

[~Alexey Kuznetsov] 
I've reviewed your solution and have several concerns about it's implementation 
and test.
1) Current implentation of cache loading mechanism doesn't fit into requirement 
that there should be no ongoing update operations during PME.
2) We're waiting for finishing that operations at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture#waitPartitionRelease
 method . Cache load operations are not covered in partitions release future 
and have their custom workflow which is big miss now and this should be fixed.
3) GridDhtInvalidPartitionException is throwed only when affinity for that 
partition is changed. What will be happened when some partitions are moved to 
other nodes, some not? In that case some of the cache loads will be finished 
with exception, some of them not. This will cause of data inconsistency between 
nodes after PME is finished.
4) For me test doesn't cover well case when cluster topology is changed. Future 
that starts new nodes can be completed before you run load to cache store. 
Moreover test don't check that event with ClusterTopologyCheckedException is 
fired always when topology is changed. It means that sometimes test checks 
negative (expected) scenario, sometimes positive scneario which leads to test 
flakiness. 

>From my side of view correct solution should be implemented as below:
1) Introduce new cache future in 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager (as welll as 
locks, transactions, atomic updates futures)  that will indicate that cache 
loading is in progress.
2) Create and register this future at the beginning of 
org.apache.ignite.internal.processors.cache.store.CacheStoreManager#loadAll 
method.
3) Future should have ability to cancel. Add this future as a part of 
partitionsRelease future and cancel it when waitPartitionsRelease event is 
happened.
4) Divide whole keys set to micro-batches in loadAll. At the end of each of 
micro-batch, check that CacheLoadFuture is not cancelled by 
waitPartitionsRelease.
5) If future was cancelled, immediately finish this future to unblock 
waitPartitionsRelease and throw appropriate exception to user, that topology is 
changed and operation should be retired.
6) Fix test in a way to have 100% guarantees that it checks negative scenario 
with cluster topology changing.

> CacheLoadingConcurrentGridStartSelfTest.testLoadCacheFromStore() test lose 
> data.
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-4210
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4210
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Anton Vinogradov
>            Assignee: Alexey Kuznetsov
>            Priority: Major
>              Labels: MakeTeamcityGreenAgain
>             Fix For: 2.8
>
>
> org.apache.ignite.internal.processors.cache.distributed.CacheLoadingConcurrentGridStartSelfTest#testLoadCacheFromStore
>  sometimes have failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to