[
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605934#comment-14605934
]
Jian Fang commented on HDFS-1623:
---------------------------------
Could someone please response on this issue? The new name node on a replacement
is critical for auto provisioning a hadoop cluster with HDFS HA support in
cloud. Without this support, the HA feature could not really be used. I also
observed that the new standby name node on the replacement instance could stuck
in safe mode because no data nodes check in with it. Even with a rolling
restart, it may take quite some time to restart all data nodes if we have a big
cluster, for example, with 4000 data nodes, let alone restarting DN is way too
intrusive and it is not a preferred operation in production. It also increases
the chance for a double failure because the standby name node is not really
ready for a failover in the case that the current active name node fails. This
is really a big issue.
Please at least provide us some pointers on why it is difficult to support
adding a new standby to a running DN and what we need to pay attention if we
need to implement this by ourselves.
Thanks again.
> High Availability Framework for HDFS NN
> ---------------------------------------
>
> Key: HDFS-1623
> URL: https://issues.apache.org/jira/browse/HDFS-1623
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Sanjay Radia
> Fix For: 2.0.0-alpha
>
> Attachments: HA-tests.pdf, HDFS-1623.rel23.patch,
> HDFS-1623.trunk.patch, HDFS-High-Availability.pdf, NameNode HA_v2.pdf,
> NameNode HA_v2_1.pdf, Namenode HA Framework.pdf, dfsio-results.tsv,
> ha-testplan.pdf, ha-testplan.tex
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)