[
https://issues.apache.org/jira/browse/HDFS-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429144#comment-13429144
]
Aaron T. Myers commented on HDFS-3744:
--------------------------------------
bq. And I would like to add Standby check at replication monitor to avoid load
in cluster.
Got it. This seems like a separate issue from what's being discussed here,
though, and so should probably be done as a separate JIRA. Do you agree?
bq. By persisting into edit logs we can be sure of which DN is decommissioned?
Not only by Standby NN but also when Standalone NN restarts.
The question that I have is still "How would differences be rectified between
what's persisted in the edit log and what's present in the excluded hosts
file?" Imagine that some host is not present in the excluded hosts file, but a
decommission action for that host is present in the edit log. Given that edit
logs are occasionally merged into an fsimage and the edit logs discarded, this
would imply that we'd need to introduce a new section into the fsimage for
per-host DN status. This means that we'd end up with two potentially out of
sync lists of DN decommission status: one in the excludes file, the other in
this new section of the fsimage file.
My point is that I think persisting DN decommission status to the edit log /
fsimage is not an unreasonable idea, but it does seem like an idea that's
incompatible with the excluded hosts config file. Given that, I'm still in
favor of just requiring the admin keep the excluded hosts files in sync, and
call refreshNodes on both NNs from DFSAdmin. I think this argument is further
supported by the fact that the active/standby NN having an out of sync view of
DN decommission status isn't actually that big of a problem. Yes, it might
result in some unnecessary replication traffic, but it shouldn't result in data
loss or unavailability, since DNs already ignore replication commands from
anything but the active NN.
> Decommissioned nodes are included in cluster after switch which is not
> expected
> -------------------------------------------------------------------------------
>
> Key: HDFS-3744
> URL: https://issues.apache.org/jira/browse/HDFS-3744
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha
> Affects Versions: 2.0.0-alpha, 2.1.0-alpha, 2.0.1-alpha
> Reporter: Brahma Reddy Battula
>
> Scenario:
> =========
> Start ANN and SNN with three DN's
> Exclude DN1 from cluster by using decommission feature
> (./hdfs dfsadmin -fs hdfs://ANNIP:8020 -refreshNodes)
> After decommission successful,do switch such that SNN will become Active.
> Here exclude node(DN1) is included in cluster.Able to write files to excluded
> node since it's not excluded.
> Checked SNN(Which Active before switch) UI decommissioned=1 and ANN UI
> decommissioned=0
> One more Observation:
> ====================
> All dfsadmin commands will create proxy only on nn1 irrespective of Active or
> standby.I think this also we need to re-look once..
> I am not getting , why we are not given HA for dfsadmin commands..?
> Please correct me,,If I am wrong.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira