[ 
https://issues.apache.org/jira/browse/HDFS-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554199#comment-13554199
 ] 

Todd Lipcon commented on HDFS-4359:
-----------------------------------

Hi Liang Xie. I noticed that there isn't a deadlock on this node alone, but the 
thread holding the lock is stuck in a 'versionRequest()' RPC. Any idea why this 
RPC is taking a long time hearing back from the NN? See the thread "DataNode: 
[file:/home/work/data1/hdfs/lgprc-xiaomi/datanode,file:/home/work/data2/hdfs/lgprc-xiaomi/datanode,file:/home/work/data3/hdfs/lgprc-xiaomi/datanode,file:/home/work/data4/hdfs/lgprc-xiaomi/datanode,file:/home/work/data5/hdfs/lgprc-xiaomi/datanode,file:/home/work/data6/hdfs/lgprc-xiaomi/datanode,file:/home/work/data7/hdfs/lgprc-xiaomi/datanode,file:/home/work/data8/hdfs/lgprc-xiaomi/datanode,file:/home/work/data9/hdfs/lgprc-xiaomi/datanode,file:/home/work/data10/hdfs/lgprc-xiaomi/datanode,file:/home/work/data11/hdfs/lgprc-xiaomi/datanode,file:/home/work/data12/hdfs/lgprc-xiaomi/datanode]
  heartbeating to /10.2.201.14:11200" daemon prio=10 tid=0x00007fd34c8e4800 
nid=0xa2d in Object.wait() [0x00007fd2db3e0000]"
                
> remove an unnecessary synchronized keyword in BPOfferService.java
> -----------------------------------------------------------------
>
>                 Key: HDFS-4359
>                 URL: https://issues.apache.org/jira/browse/HDFS-4359
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.0.2-alpha
>            Reporter: liang xie
>            Assignee: liang xie
>         Attachments: dn.jstack, HDFS-4359.txt
>
>
> we encountered a NN&DN hung issue, the DN hung was caused by no NN response 
> for heartbeat. Per DN thread dump, i think we can have a little improvement 
> on this detail code :
>   synchronized List<BPServiceActor> getBPServiceActors() {
>     return Lists.newArrayList(bpServices);
>   }
> the bpServices is declared as :
>   private List<BPServiceActor> bpServices =
>     new CopyOnWriteArrayList<BPServiceActor>();
> It's a thread-safe variant indead, so we can remove the above synchronized 
> keyword safely, IMHO.
> Here is a simple statistic for thread dump: 
> xieliang@xieliang:/tmp$ grep 0x00000007b00289f0 dn.jstack |wc -l
> 252

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to