[
https://issues.apache.org/jira/browse/IGNITE-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Daschinsky updated IGNITE-19115:
-------------------------------------
Fix Version/s: 2.15
> Possible deadlock in handling pending cache messages when the cache is
> recreated
> --------------------------------------------------------------------------------
>
> Key: IGNITE-19115
> URL: https://issues.apache.org/jira/browse/IGNITE-19115
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.14
> Reporter: Vyacheslav Koptilin
> Assignee: Vyacheslav Koptilin
> Priority: Major
> Fix For: 2.15
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Let's consider the following scenario:
> Precondition:
> there is a cluster of two server nodes (node A - coordinator, and node B)
> and an atomic cache that resides on that nodes.
> current topology version is (x, y)
> Node B initiates putting a new key-value pair into the atomic cache. Let's
> assume the primary partition, which belongs to the key, resides on node A.
> The previous step requires acquiring a gateway lock for the corresponding
> cache (GridCacheGateway read lock) and registering
> GridNearAtomicSingleUpdateFuture into the MVCC manager. It is important to
> note, that cache future does not acquire topology lock and so should not
> block PME
> Concurrently, node A initiates destroying the cache. Corresponding PME will
> be successfully completed on the coordinator node and blocked on node B just
> because the gateway is already acquired
> {noformat}
> Thread [name="sys-#105%dht.IgniteCacheRecreateTest1%", id=123,
> state=TIMED_WAITING, blockCnt=0, waitCnt=350]
> at java.lang.Thread.sleep(Native Method)
> at o.a.i.i.util.IgniteUtils.sleep(IgniteUtils.java:8316)
> at
> o.a.i.i.processors.cache.GridCacheGateway.onStopped(GridCacheGateway.java:324)
> at
> o.a.i.i.processors.cache.GridCacheProcessor.stopGateway(GridCacheProcessor.java:2582)
> at
> o.a.i.i.processors.cache.GridCacheProcessor.lambda$processCacheStopRequestOnExchangeDone$1c59e5cf$1(GridCacheProcessor.java:2776)
> at
> o.a.i.i.processors.cache.GridCacheProcessor$$Lambda$714/770930142.apply(Unknown
> Source)
> at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11628)
> at o.a.i.i.util.IgniteUtils.doInParallel(IgniteUtils.java:11530)
> at
> o.a.i.i.processors.cache.GridCacheProcessor.processCacheStopRequestOnExchangeDone(GridCacheProcessor.java:2755)
> at
> o.a.i.i.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2945)
> at
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:2528)
> at
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processFullMessage(GridDhtPartitionsExchangeFuture.java:4785)
> at
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$1500(GridDhtPartitionsExchangeFuture.java:161)
> at
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4453)
> at
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$4.apply(GridDhtPartitionsExchangeFuture.java:4441)
> at
> o.a.i.i.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:464)
> at
> o.a.i.i.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:355)
> at
> o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveFullMessage(GridDhtPartitionsExchangeFuture.java:4441)
> at
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager.processFullPartitionUpdate(GridCachePartitionExchangeManager.java:1991)
> at
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:469)
> at
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$3.onMessage(GridCachePartitionExchangeManager.java:454)
> at
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3765)
> at
> o.a.i.i.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:3744)
> at
> o.a.i.i.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
> at
> o.a.i.i.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
> at
> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
> at
> o.a.i.i.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
> at
> o.a.i.i.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
> at
> o.a.i.i.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
> at
> o.a.i.i.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
> at
> o.a.i.i.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
> at
> o.a.i.i.managers.communication.GridIoManager.access$5300(GridIoManager.java:243)
> at
> o.a.i.i.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
> at
> o.a.i.i.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> {noformat}
> Node A initiates creating a new cache with the same name as previously
> destroyed.
> Node A received a cache update message but it cannot be processed, because a
> new cache (cache with the same cacheId) is starting, so, the processing of
> this message should be postponed until PME is completed (In this case the
> GridDhtForceKeysFuture is created, and the message will not be processed
> until PME is completed. So, the near node will not receive a response and it
> will not be able to complete the previous exchange future. see IGNITE-10251).
> new PME on node B cannot proceed further just because of 3.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)