[
https://issues.apache.org/jira/browse/HDFS-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701707#comment-13701707
]
Zesheng Wu commented on HDFS-4959:
----------------------------------
Hi Fengdong, we have already encountered the same problem, and we think the
root cause is as following I described:
1. In the current implementation of dfs client, it uses a HashMap to store the
NN addresses. The keys are host0 and host1, and values are the corresponding NN
addresses. In Java HashMap, when traverse the HashMap, host1 is always ahead of
host0.
2. The dfs client traverses the HashMap to send requests to NN, at first it
will request the host1 NN, if the request on host1 NN failed or the NN on host1
is standby, then dfs client will do failover and to request the host0 NN.
3. In the current implementation of NN, nearly all the refresh work such as
refreshNodes/refreshTopology/refreshUserToGroupsMappings donot check the state
of NN. As a result, if the request is succeeded on the host1 NN, it will not
try the host0 NN. But in our clusters, host0 NN is active and host1 NN is
standby most of the time.
We have already verified this in our cluster. And we think a complete
implementation should send requests such as
refreshNodes/refreshTopology/refreshUserToGroupsMappings to both the active and
standby NN.
> Decommission data nodes, There is no response
> ---------------------------------------------
>
> Key: HDFS-4959
> URL: https://issues.apache.org/jira/browse/HDFS-4959
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, hdfs-client, namenode
> Affects Versions: 2.0.5-alpha
> Reporter: Fengdong Yu
>
> There is "dfs.hosts.exclude" configured before NN start. Active/Standby works
> well.
> 1)Add two datanodes IPAddr to the exclude file on the both Active and Standby
> NN.
> 2)run: hdfs dfsadmin -refreshNodes on the Active NN, but there isn't any logs
> in the ActiveNN log. but decommission logs showed in the StandbyNN log.
> There is no decommission datanodes showed on the Active NN webUI. but there
> does have decommission datanodes showed on the Standby NN webUI.
> But the decommission process is very very slow, it indicates it cannot finish
> forever.
> I do think this is a bug.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira