Sergey Chugunov created IGNITE-28242:
----------------------------------------
Summary: Possible deadlock as GridCacheMapEntry lock is aquired
before cp readlock
Key: IGNITE-28242
URL: https://issues.apache.org/jira/browse/IGNITE-28242
Project: Ignite
Issue Type: Bug
Affects Versions: 2.17
Reporter: Sergey Chugunov
Fix For: 2.19
In one instance a stack trace was observed (common parts are omitted):
{code:java}
"client-connector-#44" #83 prio=5 os_prio=0 cpu=5738716.41ms
elapsed=1685639.62s tid=0x00007f9b90074000 nid=0x2349 waiting on condition
[0x00007f9b704e7000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x0000000658588260> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared([email protected]/AbstractQueuedSynchronizer.java:1009)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1324)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738)
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointReadWriteLock.readLock(CheckpointReadWriteLock.java:69)
at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:117)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1602)
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:460)
<<--<<-- cp readLock is requested here
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGet0(GridCacheMapEntry.java:670)
<<--<<-- lock in this GridCacheMapEntry is acquired here
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGetVersioned(GridCacheMapEntry.java:608)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.enlistRead(GridNearTxLocal.java:2370)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.getAllAsync(GridNearTxLocal.java:1839)
at
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache$1.op(GridDhtColocatedCache.java:201)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter$AsyncOp.op(GridCacheAdapter.java:5071)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOp(GridCacheAdapter.java:3934)
at
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:199)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAsync(GridCacheAdapter.java:4169)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1399)
{code}
At the same time another thread was found in thread dump holding the cp
readlock and waiting for a lock on GridCacheMapEntry:
{code}
"client-connector-#49" #100 prio=5 os_prio=0 cpu=5745735.56ms
elapsed=1685597.78s tid=0x00007f9b9409d800 nid=0x2390 waiting on condition
[0x00007f9b5fa0e000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x00000007ee8095e0> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
at
java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:267)
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:4164)
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.obsolete(GridCacheMapEntry.java:2088)
at
org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:134)
at
org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:70)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:96)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:918)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:434)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.lockMultiple(IgniteTxManager.java:1943)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.prepareTx(IgniteTxManager.java:1162)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userPrepare(IgniteTxLocalAdapter.java:403)
{code}
It is possible that these two threads are blocking each other while
checkpointer thread prevents one of them to aquire a cp readlock and thus make
progress.
We need to investigate the order of locks being aquired and fix possibilities
for deadlocks here.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)