[ 
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070643#comment-13070643
 ] 

dhruba borthakur commented on HDFS-1623:
----------------------------------------

Hi Yang, The zk heartbeats and delivery of notifications is not inline with the 
HDFS writes to its transaction logs. Assume a scenario where ZK delivers a 
disconnected event to NameNode, but the NameNode already is in the midst of a 
flushing a long list of transactions to its transaction logs. This could 
potentially take a non-trivial amount of time (process scheduling, GC issues, 
etc).

Todd: what is our proposed solution for doing IO fencing on transaction logs 
that reside on a NFS filer? Here we are proposing that we do the following 
before we can do an auto failover:

1. If the original NameNode is reachable, kill the original NameNode process 
and verify that it is killed.
2. If step 1 fails (because of network connectivity issue), then issue a 
power-cycle event to the original NameNode machine via its configured console 
port. Verify that machine is power-cycled.
3. If Step 2 fails, then abort auto failover. Otherwise continue failover 
sequence.

> High Availability Framework for HDFS NN
> ---------------------------------------
>
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode 
> HA_v2_1.pdf, Namenode HA Framework.pdf
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to