[
https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218116#comment-17218116
]
Hongbing Wang commented on HDFS-15641:
--------------------------------------
Thanks [~hexiaoqiao] for attention. There may be a bit of confusion here.
*lifelineSender.start()* does not refer to the start of the thread.
LifelineSender has rewritten the start() method, as follows:
{code:java}
// BPServiceActor$LifelineSender#start
public void start() {
lifelineThread = new Thread(this,
formatThreadName("lifeline", lifelineNnAddr)); // formatThreadName occurs
deadlock
lifelineThread.setDaemon(true);
//...
lifelineThread.start(); //Thread start here
}
// formatThreadName
private String formatThreadName(
final String action,
final InetSocketAddress addr) {
String bpId = bpos.getBlockPoolId(true);
//...
}
// getBlockPoolId
String getBlockPoolId(boolean quiet) {
// avoid lock contention unless the registration hasn't completed.
String id = bpId;
if (id != null) {
return id;
}
DataNodeFaultInjector.get().delayWhenOfferServiceHoldLock();
readLock(); // deadlock occurs here
//...
}{code}
To be precise, the deadlock occurs in the `refreshThread` and `bpThread`.
Deadlock is related to the above *start ->* *formatThreadName -> getBlockPoolId
-> readLock and readUnlock* . So, I promise to let _readLock and readUnlock_ is
completely executed before starting `bpThread`.
The test I given can reproduce the deadlock before the fix, and test passed
after the fix.
Thanks [~hexiaoqiao] again.
> DataNode could meet deadlock if invoke refreshNameNode
> ------------------------------------------------------
>
> Key: HDFS-15641
> URL: https://issues.apache.org/jira/browse/HDFS-15641
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.2.0
> Reporter: Hongbing Wang
> Assignee: Hongbing Wang
> Priority: Critical
> Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch,
> deadlock.png, jstack.log
>
>
> DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes
> hostname:50020` to register a new namespace in federation env.
> The jstack is shown in jstack.log
> The specific process is shown in Figure deadlock.png
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]