[ 
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605934#comment-14605934
 ] 

Jian Fang commented on HDFS-1623:
---------------------------------

Could someone please response on this issue? The new name node on a replacement 
is critical for auto provisioning a hadoop cluster with HDFS HA support in 
cloud. Without this support, the HA feature could not really be used. I also 
observed that the new standby name node on the replacement instance could stuck 
in safe mode  because no data nodes check in with it. Even with a rolling 
restart, it may take quite some time to restart all data nodes if we have a big 
cluster, for example, with 4000 data nodes, let alone restarting DN is way too 
intrusive and it is not a preferred operation in production. It also increases 
the chance for a double failure because the standby name node is not really 
ready for a failover in the case that the current active name node fails. This 
is really a big issue. 

Please at least provide us some pointers on why it is difficult to support 
adding a new standby to a running DN and what we need to pay attention if we 
need to implement this by ourselves. 

Thanks again.


> High Availability Framework for HDFS NN
> ---------------------------------------
>
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>             Fix For: 2.0.0-alpha
>
>         Attachments: HA-tests.pdf, HDFS-1623.rel23.patch, 
> HDFS-1623.trunk.patch, HDFS-High-Availability.pdf, NameNode HA_v2.pdf, 
> NameNode HA_v2_1.pdf, Namenode HA Framework.pdf, dfsio-results.tsv, 
> ha-testplan.pdf, ha-testplan.tex
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to