[jira] [Commented] (HDFS-4959) Decommission data nodes, There is no response

Zesheng Wu (JIRA) Sun, 07 Jul 2013 17:20:18 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701707#comment-13701707
 ]


Zesheng Wu commented on HDFS-4959:
----------------------------------

Hi Fengdong, we have already encountered the same problem, and we think the 
root cause is as following I described:
1. In the current implementation of dfs client, it uses a HashMap to store the 
NN addresses. The keys are host0 and host1, and values are the corresponding NN 
addresses. In Java HashMap, when traverse the HashMap, host1 is always ahead of 
host0.
2. The dfs client traverses the HashMap to send requests to NN, at first it 
will request the host1 NN, if the request on host1 NN failed or the NN on host1 
is standby, then dfs client will do failover and to request the host0 NN.
3. In the current implementation of NN, nearly all the refresh work such as 
refreshNodes/refreshTopology/refreshUserToGroupsMappings donot check the state 
of NN. As a result, if the request is succeeded on the host1 NN, it will not 
try the host0 NN. But in our clusters, host0 NN is active and host1 NN is 
standby most of the time.

We have already verified this in our cluster. And we think a complete 
implementation should send requests such as 
refreshNodes/refreshTopology/refreshUserToGroupsMappings to both the active and 
standby NN.
                
> Decommission data nodes, There is no response
> ---------------------------------------------
>
>                 Key: HDFS-4959
>                 URL: https://issues.apache.org/jira/browse/HDFS-4959
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, hdfs-client, namenode
>    Affects Versions: 2.0.5-alpha
>            Reporter: Fengdong Yu
>
> There is "dfs.hosts.exclude" configured before NN start. Active/Standby works 
> well. 
> 1)Add two datanodes IPAddr to the exclude file on the both Active and Standby 
> NN. 
> 2)run: hdfs dfsadmin -refreshNodes on the Active NN, but there isn't any logs 
> in the ActiveNN log. but decommission logs showed in the StandbyNN log.
> There is no decommission datanodes showed on the Active NN webUI. but there 
> does have decommission datanodes showed on the Standby NN webUI.
> But the decommission process is very very slow, it indicates it cannot finish 
> forever.
> I do think this is a bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4959) Decommission data nodes, There is no response

Reply via email to