[jira] [Commented] (HDFS-3744) Decommissioned nodes are included in cluster after switch which is not expected

Aaron T. Myers (JIRA) Mon, 06 Aug 2012 06:41:03 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429144#comment-13429144
 ]


Aaron T. Myers commented on HDFS-3744:
--------------------------------------

bq. And I would like to add Standby check at replication monitor to avoid load 
in cluster.

Got it. This seems like a separate issue from what's being discussed here, 
though, and so should probably be done as a separate JIRA. Do you agree?

bq. By persisting into edit logs we can be sure of which DN is decommissioned? 
Not only by Standby NN but also when Standalone NN restarts.

The question that I have is still "How would differences be rectified between 
what's persisted in the edit log and what's present in the excluded hosts 
file?" Imagine that some host is not present in the excluded hosts file, but a 
decommission action for that host is present in the edit log. Given that edit 
logs are occasionally merged into an fsimage and the edit logs discarded, this 
would imply that we'd need to introduce a new section into the fsimage for 
per-host DN status. This means that we'd end up with two potentially out of 
sync lists of DN decommission status: one in the excludes file, the other in 
this new section of the fsimage file.

My point is that I think persisting DN decommission status to the edit log / 
fsimage is not an unreasonable idea, but it does seem like an idea that's 
incompatible with the excluded hosts config file. Given that, I'm still in 
favor of just requiring the admin keep the excluded hosts files in sync, and 
call refreshNodes on both NNs from DFSAdmin. I think this argument is further 
supported by the fact that the active/standby NN having an out of sync view of 
DN decommission status isn't actually that big of a problem. Yes, it might 
result in some unnecessary replication traffic, but it shouldn't result in data 
loss or unavailability, since DNs already ignore replication commands from 
anything but the active NN.
                
> Decommissioned nodes are included in cluster after switch which is not 
> expected
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-3744
>                 URL: https://issues.apache.org/jira/browse/HDFS-3744
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.0.0-alpha, 2.1.0-alpha, 2.0.1-alpha
>            Reporter: Brahma Reddy Battula
>
> Scenario:
> =========
> Start ANN and SNN with three DN's
> Exclude DN1 from cluster by using decommission feature 
> (./hdfs dfsadmin -fs hdfs://ANNIP:8020 -refreshNodes)
> After decommission successful,do switch such that SNN will become Active.
> Here exclude node(DN1) is included in cluster.Able to write files to excluded 
> node since it's not excluded.
> Checked SNN(Which Active before switch) UI decommissioned=1 and ANN UI 
> decommissioned=0
> One more Observation:
> ====================
> All dfsadmin commands will create proxy only on nn1 irrespective of Active or 
> standby.I think this also we need to re-look once..
> I am not getting , why we are not given HA for dfsadmin commands..?
> Please correct me,,If I am wrong.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3744) Decommissioned nodes are included in cluster after switch which is not expected

Reply via email to