Ivan Daschinskiy created IGNITE-13690: -----------------------------------------
Summary: Failed to init coordinator caches on concurrent start of nodes with different cache configurations. Key: IGNITE-13690 URL: https://issues.apache.org/jira/browse/IGNITE-13690 Project: Ignite Issue Type: Bug Affects Versions: 2.9 Reporter: Ivan Daschinskiy Scenario: 1. Start simultaneously nodes with different cache configurations (for simplicity, let client nodes be with configured caches, servers without). 2. When processing first exchange on coordinator, coordinator will fail with {code:java} [2020-11-10 13:23:57,232][ERROR][start-node-1][DifferentCacheConfigurationConcurrentStart0] Got exception while starting (will rollback startup routine). java.lang.AssertionError: Invalid exchange futures state [cur=6, total=7] at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$17.applyx(CacheAffinitySharedManager.java:1964) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager$17.applyx(CacheAffinitySharedManager.java:1935) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.lambda$forAllRegisteredCacheGroups$e0a6939d$1(CacheAffinitySharedManager.java:1265) at org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11157) at org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11059) at org.apache.ignite.internal.util.IgniteUtils.doInParallel(IgniteUtils.java:11039) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.forAllRegisteredCacheGroups(CacheAffinitySharedManager.java:1264) at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.initCoordinatorCaches(CacheAffinitySharedManager.java:1935) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCoordinatorCaches(GridDhtPartitionsExchangeFuture.java:716) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:850) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3175) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3021) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) at java.lang.Thread.run(Thread.java:748) {code} The main reason is the race on creating {{LocalJoinCachesContext}}, so local join caches differs from registered caches from other nodes. Reproducer for zk and ring discoveries are attached. NB! Not always reproducible -- to increase probability of fail, add sleep in {{GridDhtPartitionsExchangeFuture#init}} {code:java} public void init(boolean newCrd) throws IgniteInterruptedCheckedException { if (newCrd) U.sleep(500); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)