[
https://issues.apache.org/jira/browse/HDFS-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554629#comment-13554629
]
liang xie commented on HDFS-4359:
---------------------------------
Yes, the root cause is at namenode side, this issue is just against to
datanode, we can remove the "synchronized" keyword safely per thread dump and
source code, though it didn't help for the whole hung accident:)
bq. the thread holding the lock is stuck in a 'versionRequest()' RPC. Any idea
why this RPC is taking a long time hearing back from the NN?
yes, we've figured it out several days before, one of DNS servers is in
accident, but the thread dump is really interesting, i've uploaded the NN
thread dump for you enjoy it:) btw, the JUCL lock is not easy to find the
lock-holder, which make us difficult to analyis...
> remove an unnecessary synchronized keyword in BPOfferService.java
> -----------------------------------------------------------------
>
> Key: HDFS-4359
> URL: https://issues.apache.org/jira/browse/HDFS-4359
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.0.2-alpha
> Reporter: liang xie
> Assignee: liang xie
> Attachments: dn.jstack, HDFS-4359.txt, nn_dns_broken.jstack
>
>
> we encountered a NN&DN hung issue, the DN hung was caused by no NN response
> for heartbeat. Per DN thread dump, i think we can have a little improvement
> on this detail code :
> synchronized List<BPServiceActor> getBPServiceActors() {
> return Lists.newArrayList(bpServices);
> }
> the bpServices is declared as :
> private List<BPServiceActor> bpServices =
> new CopyOnWriteArrayList<BPServiceActor>();
> It's a thread-safe variant indead, so we can remove the above synchronized
> keyword safely, IMHO.
> Here is a simple statistic for thread dump:
> xieliang@xieliang:/tmp$ grep 0x00000007b00289f0 dn.jstack |wc -l
> 252
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira