[
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070631#comment-13070631
]
Suresh Srinivas commented on HDFS-2179:
---------------------------------------
Case 1), where active standby are in communication and co-operating does not
require fencing at all. Fencing is required only when active/standby cannot
communicate. So we should drop that out of cases to consider.
When using solutions such as LinuxHA, a local process (LRM) kills the process
to be fenced. This does not require ssh to the node. HDFS-2185 should consider
this requirement. I might start with LinuxHA to play around with this, in the
first phase, since I think getting a rock solid and correct fail-over
controller is non-trivial.
> HA: namenode fencing mechanism
> ------------------------------
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is
> active at a time has to be preserved in order to prevent "split brain
> syndrome." Thus, when a standby NN is transition to "active" state during a
> failover, it needs to somehow _fence_ the formerly active NN to ensure that
> it can no longer perform edits. This JIRA is to discuss and implement NN
> fencing.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira