ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152897335
Thanks @slfan1989 for your comment.
I'm sorry and I feel that you don't get the root cause of the failure of
`org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement.testSynchronousEviction`.
Please refer to the stack, and
```
"DataXceiver for client DFSClient_NONMAPREDUCE_-1350116008_11 at
/127.0.0.1:51273 [Receiving block BP-1502139676-192.168
.3.4-1654943490123:blk_1073741826_1002]" #146 daemon prio=5 os_prio=31
tid=0x00007fb5cee2d800 nid=0x11507 waiting on con
dition [0x000070000c8ed000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007a14b6330> (a
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:8
36)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at
org.apache.hadoop.hdfs.server.common.AutoCloseDataSetLock.lock(AutoCloseDataSetLock.java:62)
at
org.apache.hadoop.hdfs.server.datanode.DataSetLockManager.getWriteLock(DataSetLockManager.java:214)
at
org.apache.hadoop.hdfs.server.datanode.DataSetLockManager.writeLock(DataSetLockManager.java:170)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl$LazyWriter.evictBlocks(FsDatasetImpl.java:3526)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.evictLazyPersistBlocks(FsDatasetImpl.java:3656)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.reserveLockedMemory(FsDatasetImpl.java:3675)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1606)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:219)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1319)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:767)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:293)
at java.lang.Thread.run(Thread.java:748)
```
> Is it because createRbw got the read lock, which caused evictBlocks to get
the write lock for a long time
evictBlocks is impossible to acquire the write lock, since
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
holds the read lock of this block pool. And
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
is waiting for evictBlocks to finish. so it's deadlock.
> so will it also deadlock(When createRbw And addVolume are done at the same
time)?
I'm interested in this deadlock, can you provide a reproduction process?
thanks~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]