[ 
https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219003#comment-17219003
 ] 

Hongbing Wang commented on HDFS-15641:
--------------------------------------

Thanks [~ferhui] for your reply. I will explain in two steps.
(a)*The occurrence of deadlock*: see figure below, and the corresponding jstack 
is [^jstack.log]

!deadlock.png|width=973,height=214!

Related locks: `instance of BlockPoolManager` and `read-write lock in 
BPOfferService`.

(b)*The fix I proposed:* In [^HDFS-15641.002.patch], I made 3 changes:
 # `+BPOfferService.java+`:  I just injected a test error to delay 1s. This 
only takes effect in test and does not affect the production env. Both threads 
will wait a short while after acquiring their respective locks.
 # `+BPServiceActor.java+`:  This is my change to fix the bug. Ensure that the 
time to start `bpThread` is after the read lock is completed.
 # `+TestRefreshNamenodesFailure.java+`:  just test.

Merge the above 1 and 3 can reproduce the deadlock. And merge 1, 2 and 3 can 
fix this deadlock.

The process after fixed is as follows:
!deadlock_fixed.png|width=1027,height=222!

Thanks again !

 

> DataNode could meet deadlock if invoke refreshNameNode
> ------------------------------------------------------
>
>                 Key: HDFS-15641
>                 URL: https://issues.apache.org/jira/browse/HDFS-15641
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.2.0
>            Reporter: Hongbing Wang
>            Assignee: Hongbing Wang
>            Priority: Critical
>         Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, 
> deadlock.png, deadlock_fixed.png, jstack.log
>
>
> DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes 
> hostname:50020` to register a new namespace in federation env.
> The jstack is shown in jstack.log
>  The specific process is shown in Figure deadlock.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to