[
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780461
]
ASF GitHub Bot logged work on HDFS-16600:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 11/Jun/22 01:23
Start Date: 11/Jun/22 01:23
Worklog Time Spent: 10m
Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152828770
@Hexiaoqiao @ZanderXu @tomscut
I still have some doubts about this.
1. I still hope ZanderXu Can provide deadlock exception stack error
information, I will continue to try to reproduce this problem in this part.
2. I read the code of testSynchronousEviction carefully, this code uses the
special storage strategy LAZY_PERSIST, This strategy will asynchronously flush
memory blocks to disk. LazyWriter takes care of this work.
Part of the code is as follows
```
private boolean saveNextReplica() {
RamDiskReplica block = null;
FsVolumeReference targetReference;
FsVolumeImpl targetVolume;
ReplicaInfo replicaInfo;
boolean succeeded = false;
try {
block = ramDiskReplicaTracker.dequeueNextReplicaToPersist();
if (block != null) {
try (AutoCloseableLock lock =
lockManager.writeLock(LockLevel.BLOCK_POOl,
block.getBlockPoolId())) {
replicaInfo = volumeMap.get(block.getBlockPoolId(),
block.getBlockId());
.....
```
If ZanderXu's judgment is correct, will this code also deadlock?
3.I always have a question, why we first add blockpool readlock, and then
add volume write lock, how is the order of this lock derived?
4.I checked lockManager.writeLock(LockLevel.BLOCK_POOl,
block.getBlockPoolId()), and I found that when adding volume, the writeLock of
BLOCK_POOl is also used, so will it also deadlock?
> in conclusion
I don't think this is a deadlock. Is it because createRow got the read lock,
which caused evictBlocks to get the write lock for a long time, and then
exceeded the waiting time of the junit test, which eventually led to an error.
I think to solve this problem completely, we also need to look at the
processing logic of LazyWriter. It should not be enough to just modify
evictBlocks.
Issue Time Tracking
-------------------
Worklog Id: (was: 780461)
Time Spent: 3h 40m (was: 3.5h)
> Deadlock on DataNode
> --------------------
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> The UT
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction
> failed, because happened deadlock, which is introduced by
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534].
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl,
> bpid))
> {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]