[
https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070643#comment-13070643
]
dhruba borthakur commented on HDFS-1623:
----------------------------------------
Hi Yang, The zk heartbeats and delivery of notifications is not inline with the
HDFS writes to its transaction logs. Assume a scenario where ZK delivers a
disconnected event to NameNode, but the NameNode already is in the midst of a
flushing a long list of transactions to its transaction logs. This could
potentially take a non-trivial amount of time (process scheduling, GC issues,
etc).
Todd: what is our proposed solution for doing IO fencing on transaction logs
that reside on a NFS filer? Here we are proposing that we do the following
before we can do an auto failover:
1. If the original NameNode is reachable, kill the original NameNode process
and verify that it is killed.
2. If step 1 fails (because of network connectivity issue), then issue a
power-cycle event to the original NameNode machine via its configured console
port. Verify that machine is power-cycled.
3. If Step 2 fails, then abort auto failover. Otherwise continue failover
sequence.
> High Availability Framework for HDFS NN
> ---------------------------------------
>
> Key: HDFS-1623
> URL: https://issues.apache.org/jira/browse/HDFS-1623
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Sanjay Radia
> Assignee: Sanjay Radia
> Attachments: HDFS-High-Availability.pdf, NameNode HA_v2.pdf, NameNode
> HA_v2_1.pdf, Namenode HA Framework.pdf
>
>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira