[ https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908852#comment-16908852 ]
Ignite TC Bot commented on IGNITE-9562: --------------------------------------- {panel:title=Branch: [pull/6781/head] Base: [ignite-2.7.6] : Possible Blockers (41)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Platform C++ (Linux Clang){color} [[tests 0 Exit Code , Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503638]] {color:#d04437}Platform .NET (Inspections)*{color} [[tests 0 Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503679]] {color:#d04437}Platform C++ (Linux)*{color} [[tests 0 Exit Code , Failure on metric |https://ci.ignite.apache.org/viewLog.html?buildId=4503632]] {color:#d04437}MVCC Cache{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503639]] * IgniteCacheMvccTestSuite: CacheMvccPartitionedCoordinatorFailoverTest.testGetReadInsideTxInProgressCoordinatorFails_ReadDelay - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Cache (Restarts) 2{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503654]] * IgniteCacheRestartTestSuite2: IgniteCachePutAllRestartTest.testStopNode - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}PDS 1{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=4503673]] * IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyCachesAbruptly - Test has low fail rate in base branch 0,0% and is not flaky * IgnitePdsTestSuite: IgnitePdsDestroyCacheTest.testDestroyGroupCachesAbruptly - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Cache 7{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503662]] * IgniteCacheTestSuite7: CacheMetricsManageTest.testJmxPdsStatisticsEnable - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Cache 6{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503661]] * IgniteCacheTestSuite6: TxRollbackOnTimeoutNearCacheTest.testSimple - Test has low fail rate in base branch 0,0% and is not flaky {color:#d04437}Platform C++ (Win x64 / Release){color} [[tests 0 BuildFailureOnMessage |https://ci.ignite.apache.org/viewLog.html?buildId=4503633]] {color:#d04437}ZooKeeper (Discovery) 1{color} [[tests 30|https://ci.ignite.apache.org/viewLog.html?buildId=4503630]] * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testSegmentation2 - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testSegmentation3 - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testConcurrentStartStop2_EventsThrottle - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testCustomEventsSimple1_5_Nodes - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testMultipleClusters - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testConnectionRestore_Coordinator1_1 - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testWithPersistence1 - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testWithPersistence2 - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testStartStop1 - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testStartStop2 - Test has low fail rate in base branch 0,0% and is not flaky * ZookeeperDiscoverySpiTestSuite1: ZookeeperDiscoverySpiTest.testDeployService2 - Test has low fail rate in base branch 0,0% and is not flaky ... and 19 tests blockers {color:#d04437}Cache (Failover) 1{color} [[tests 1|https://ci.ignite.apache.org/viewLog.html?buildId=4503648]] * IgniteCacheFailoverTestSuite: IgniteAtomicLongChangingTopologySelfTest.testClientQueueCreateCloseFailover - Test has low fail rate in base branch 0,0% and is not flaky {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=4503707&buildTypeId=IgniteTests24Java8_RunAll] > Destroyed cache that resurrected on an old offline node breaks PME > ------------------------------------------------------------------ > > Key: IGNITE-9562 > URL: https://issues.apache.org/jira/browse/IGNITE-9562 > Project: Ignite > Issue Type: Bug > Components: cache > Affects Versions: 2.5 > Reporter: Pavel Kovalenko > Assignee: Eduard Shangareev > Priority: Critical > Fix For: 2.8, 2.7.6 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Given: > 2 nodes, persistence enabled. > 1) Stop 1 node > 2) Destroy cache through client > 3) Start stopped node > When the stopped node joins to cluster it starts all caches that it has seen > before stopping. > If that cache was cluster-widely destroyed it leads to breaking the crash > recovery process or PME. > Root cause - we don't start/collect caches from the stopped node on another > part of a cluster. > In case of PARTITIONED cache mode that scenario breaks crash recovery: > {noformat} > java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0] > at > org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > In case of REPLICATED cache mode that scenario breaks PME coordinator process: > {noformat} > [2018-09-12 > 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager] > Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee25100001, > messageType=class > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage] > java.lang.AssertionError: 3080586 > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:2249) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1628) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:141) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:368) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2999) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2978) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > As one of the solutions - we shouldn't start such caches on resurrected nodes. > We should save caches changes history somewhere and cluster-widely spread it > to joining nodes. > In a case when cache was only stopped, we can do nothing and start it lately > when cache start request received. > In a case when cache was stopped & destroyed, we should clean persistence > data for that cache. -- This message was sent by Atlassian JIRA (v7.6.14#76016)