[
https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062512#comment-16062512
]
Chen Zhang commented on HDFS-8693:
----------------------------------
Hi [~vinayrpet] [~ajithshetty], any progress on this issue?
Supporting add a new standby is very useful for large cluster operation. When
one of the machine running namenode is down and we have to add another new
standby, restarting thousands of datanodes will take very long time. Once the
active namenode is crushed during this time, whole cluster will not available.
> refreshNamenodes does not support adding a new standby to a running DN
> ----------------------------------------------------------------------
>
> Key: HDFS-8693
> URL: https://issues.apache.org/jira/browse/HDFS-8693
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, ha
> Affects Versions: 2.6.0
> Reporter: Jian Fang
> Assignee: Ajith S
> Priority: Critical
> Attachments: HDFS-8693.02.patch, HDFS-8693.1.patch
>
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA
> support
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new
> one so that I don't need to restart the data nodes. However, I got the
> following error:
> refreshNamenodes: HA does not currently support adding a new standby to a
> running DN. Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code
> snippet, which led me to this JIRA.
> void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException {
> Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date.
> throw new IOException( "HA does not currently support adding a new standby to
> a running DN. " + "Please do a rolling restart of DNs to reconfigure the list
> of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature.
> Unfortunately, the new name node on a replacement is critical for auto
> provisioning a hadoop cluster with HDFS HA support. Without this support, the
> HA feature could not really be used. I also observed that the new standby
> name node on the replacement instance could stuck in safe mode because no
> data nodes check in with it. Even with a rolling restart, it may take quite
> some time to restart all data nodes if we have a big cluster, for example,
> with 4000 data nodes, let alone restarting DN is way too intrusive and it is
> not a preferable operation in production. It also increases the chance for a
> double failure because the standby name node is not really ready for a
> failover in the case that the current active name node fails.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]