Sergey Chugunov created IGNITE-28242:
----------------------------------------

             Summary: Possible deadlock as GridCacheMapEntry lock is aquired 
before cp readlock
                 Key: IGNITE-28242
                 URL: https://issues.apache.org/jira/browse/IGNITE-28242
             Project: Ignite
          Issue Type: Bug
    Affects Versions: 2.17
            Reporter: Sergey Chugunov
             Fix For: 2.19


In one instance a stack trace was observed (common parts are omitted):

{code:java}
"client-connector-#44" #83 prio=5 os_prio=0 cpu=5738716.41ms 
elapsed=1685639.62s tid=0x00007f9b90074000 nid=0x2349 waiting on condition  
[0x00007f9b704e7000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x0000000658588260> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared([email protected]/AbstractQueuedSynchronizer.java:1009)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1324)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738)
        at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointReadWriteLock.readLock(CheckpointReadWriteLock.java:69)
        at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:117)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1602)
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:460)
 <<--<<-- cp readLock is requested here
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGet0(GridCacheMapEntry.java:670)
 <<--<<-- lock in this GridCacheMapEntry is acquired here
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGetVersioned(GridCacheMapEntry.java:608)
        at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.enlistRead(GridNearTxLocal.java:2370)
        at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.getAllAsync(GridNearTxLocal.java:1839)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache$1.op(GridDhtColocatedCache.java:201)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter$AsyncOp.op(GridCacheAdapter.java:5071)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOp(GridCacheAdapter.java:3934)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:199)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAsync(GridCacheAdapter.java:4169)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1399)
{code}

At the same time another thread was found in thread dump holding the cp 
readlock and waiting for a lock on GridCacheMapEntry:

{code}
"client-connector-#49" #100 prio=5 os_prio=0 cpu=5745735.56ms 
elapsed=1685597.78s tid=0x00007f9b9409d800 nid=0x2390 waiting on condition  
[0x00007f9b5fa0e000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x00000007ee8095e0> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at 
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
        at 
java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:267)
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:4164)
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.obsolete(GridCacheMapEntry.java:2088)
        at 
org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:134)
        at 
org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:70)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:96)
        at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:918)
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:434)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.lockMultiple(IgniteTxManager.java:1943)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.prepareTx(IgniteTxManager.java:1162)
        at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userPrepare(IgniteTxLocalAdapter.java:403)
{code}

It is possible that these two threads are blocking each other while 
checkpointer thread prevents one of them to aquire a cp readlock and thus make 
progress.

We need to investigate the order of locks being aquired and fix possibilities 
for deadlocks here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to