[
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780495
]
ASF GitHub Bot logged work on HDFS-16600:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Jun/22 10:45
Start Date: 11/Jun/22 10:45
Worklog Time Spent: 10m
Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152897335
Thanks @slfan1989 for your comment.
I'm sorry and I feel that you don't get the root cause of the failure of
`org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement.testSynchronousEviction`.
Please refer to the stack, and
```
"DataXceiver for client DFSClient_NONMAPREDUCE_-1350116008_11 at
/127.0.0.1:51273 [Receiving block BP-1502139676-192.168
.3.4-1654943490123:blk_1073741826_1002]" #146 daemon prio=5 os_prio=31
tid=0x00007fb5cee2d800 nid=0x11507 waiting on con
dition [0x000070000c8ed000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007a14b6330> (a
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:8
36)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at
org.apache.hadoop.hdfs.server.common.AutoCloseDataSetLock.lock(AutoCloseDataSetLock.java:62)
at
org.apache.hadoop.hdfs.server.datanode.DataSetLockManager.getWriteLock(DataSetLockManager.java:214)
at
org.apache.hadoop.hdfs.server.datanode.DataSetLockManager.writeLock(DataSetLockManager.java:170)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl$LazyWriter.evictBlocks(FsDatasetImpl.java:3526)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.evictLazyPersistBlocks(FsDatasetImpl.java:3656)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.reserveLockedMemory(FsDatasetImpl.java:3675)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1606)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:219)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1319)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:767)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:293)
at java.lang.Thread.run(Thread.java:748)
```
> Is it because createRbw got the read lock, which caused evictBlocks to get
the write lock for a long time
evictBlocks is impossible to acquire the write lock, since
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
holds the read lock of this block pool. And
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
is waiting for evictBlocks to finish. so it's deadlock.
> so will it also deadlock(When createRbw And addVolume are done at the same
time)?
I'm interested in this deadlock, can you provide a reproduction process?
thanks~
Issue Time Tracking
-------------------
Worklog Id: (was: 780495)
Time Spent: 3h 50m (was: 3h 40m)
> Deadlock on DataNode
> --------------------
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> The UT
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction
> failed, because happened deadlock, which is introduced by
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534].
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl,
> bpid))
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]