[
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615227#comment-14615227
]
Kihwal Lee commented on HDFS-8693:
----------------------------------
I do agree that {{refreshNameNodes}} needs to be fixed. This command does not
work for federated HA clusters. Also, if one service actor thread shuts down in
HA, there is no way to start it up again without restarting the datanode. The
datanode should shutdown in this case, or {{refreshNamenodes}} should be fixed
to work with HA.
> refreshNamenodes does not support adding a new standby to a running DN
> ----------------------------------------------------------------------
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, ha
> Affects Versions: 2.6.0
> Reporter: Jian Fang
> Priority: Critical
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA
> support
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new
> one so that I don't need to restart the data nodes. However, I got the
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException {
> Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date.
> throw new IOException( "HA does not currently support adding a new standby to
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature.
> Unfortunately, the new name node on a replacement is critical for auto
> provisioning a hadoop cluster with HDFS HA support. Without this support, the
> HA feature could not really be used. I also observed that the new standby
> name node on the replacement instance could stuck in safe mode because no
> data nodes check in with it. Even with a rolling restart, it may take quite
> some time to restart all data nodes if we have a big cluster, for example,
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is
> not a preferable operation in production. It also increases the chance for a
> double failure because the standby name node is not really ready for a
> failover in the case that the current active name node fails.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)