[
https://issues.apache.org/jira/browse/IGNITE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrey Mashenkov reassigned IGNITE-13082:
-----------------------------------------
Assignee: Andrey Mashenkov
> Deadlock between topology update and CQ registration.
> -----------------------------------------------------
>
> Key: IGNITE-13082
> URL: https://issues.apache.org/jira/browse/IGNITE-13082
> Project: Ignite
> Issue Type: Task
> Affects Versions: 2.7
> Reporter: Andrey Mashenkov
> Assignee: Andrey Mashenkov
> Priority: Major
> Labels: deadlock
> Fix For: 2.9
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Relevant stack traces:Relevant stack traces:
>
> {code:java}
> "sys-stripe-0-#65483%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%"
> #85739 prio=5 os_prio=0 tid=0x00007fda80139800 nid=0x5618 waiting on
> condition [0x00007fdc018e8000] java.lang.Thread.State: WAITING (parking) at
> sun.misc.Unsafe.park(Native Method) - parking to wait for
> <0x00000000fc138298> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockListenerReadLock(GridCacheMapEntry.java:5032)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(GridCacheMapEntry.java:2262)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(GridDhtAtomicCache.java:2574)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(GridDhtAtomicCache.java:2034)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1854)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1668)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3239)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:139)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:273)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:268)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
> at
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1626)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1246)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:142)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1137)
> at
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
> at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at
> java.lang.Thread.run(Thread.java:748)
> {code}
>
> {code:java}
> "disco-notifier-worker-#65517%cache.BinaryMetadataRegistrationInsideEntryProcessorTest0%"
> #85777 prio=5 os_prio=0 tid=0x00007fda800a9800 nid=0x5639 waiting on
> condition [0x00007fdc006d9000] java.lang.Thread.State: WAITING (parking) at
> sun.misc.Unsafe.park(Native Method) - parking to wait for
> <0x00000000fbde5f30> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.localUpdateCounters(GridDhtPartitionTopologyImpl.java:2810)
> at
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$2.onRegister(CacheContinuousQueryHandler.java:379)
> at
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.registerListener(CacheContinuousQueryManager.java:946)
> at
> org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.register(CacheContinuousQueryHandler.java:628)
> at
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.registerHandler(GridContinuousProcessor.java:1818)
> at
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.processStartRequest(GridContinuousProcessor.java:1444)
> at
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.access$400(GridContinuousProcessor.java:113)
> at
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:205)
> at
> org.apache.ignite.internal.processors.continuous.GridContinuousProcessor$2.onCustomEvent(GridContinuousProcessor.java:196)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:639)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:510)
> - locked <0x00000000fb58bbc8> (a java.lang.Object) at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4$$Lambda$91/1259207939.run(Unknown
> Source) at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2650)
> at
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2688)
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at
> java.lang.Thread.run(Thread.java:748){code}
> The problematic code is in {{CacheContinuousQueryManager.registerListener}}.
> It first acquires CQ listener write lock, and then it acquires topology read
> lock when update counters are being read.During cache update, we first
> acquire topology read lock and then acquire CQ listener read lock.If some
> other thread will try to acquire topology write lock in between, those two
> threads are deadlocked.
> The issue seems to be introduced by IGNITE-10755 (topology read lock is
> inserted inside CQ write lock).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)