[
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068700#comment-13068700
]
Todd Lipcon commented on HDFS-2179:
-----------------------------------
h3. Fencing overview
In order to fence a NN, there are several different methods, at varying levels
of nastiness:
1) Cooperative active->standby transition or shutdown
In the case of a manual failover, the old primary can gracefully either
transition to a standby mode, or gracefully shut down. In this case, since we
assume the software to be cooperative, no real "fencing" is necessary -- the
new NN just needs to unambiguously confirm that the old NN has dropped out of
active mode.
This method succeeds only if the old NN remains in full operation.
2) Process killing or death verification (eg via ssh or a second daemon)
In the case that the old primary has either hung (eg deadlock) or crashed (eg
JVM segfault), but the host is OK, the new primary may contact that host and
send SIGKILL to the NameNode JVM. This may be done either via ssh or via
contacting some process which is still running on the node. It is also
sufficient to _verify_ that the NN process is no longer running in the case
that its JVM crashed.
This method succeeds only if the _host_ of the old NN remains in full
operation, despite the NN itself being deadlocked or crashed.
3) Storage fencing
Depending on the type of storage in which the old NN stores its edits
directories, the new NN may explicitly fence the storage. This is typically
accomplished using a vendor-specific extension. For example, NetApp filers
support the command "exportfs -b enable save <nnhost.com> /vol/vol0" which can
be remotely issued in order to disallow any further access to a particular
mount by a particular host.
In the case of edits stored on BookKeeper in the future, we may be able to
implement some kind of lease revocation or fencing within that storage system.
4) Network port fencing
Many switches support remote management. One way to prevent a NameNode from
responding to any further requests is to forcibly disable its network port. An
alternative similar mechanism is to use something like a LOM card to remotely
disable the NIC.
5) Power port fencing (aka STONITH)
Many power distribution units (PDUs) support remote management. The last ditch
effort to fence a node is to literally "pull the power"
h3. Proposal
Since methods 3-5 above are usually vendor-specific implementations, it does
not make sense to try to implement a catch-all fencing mechanism within Hadoop.
Instead, operators are likely to want to use commonly available shell scripts
that work against their preferred hardware. Given this, I would propose that
Hadoop's fencing behavior be:
- Configure a list of "fence methods", each with an associated priority.
- Each fence method returns an exit code indicating whether it has successfully
fenced the target node.
- If any method succeeds, no further method is attempted.
- If a method fails, continue down the list to try the next method.
- If all fence methods fail, then both nodes remain in "standby" state, and an
administrator must manually force the transition after verifying that the other
node is no longer active.
The first fence method will always be the "cooperative" method. We can also
ship with Hadoop an implementation of method #2
(shoot-the-other-process-in-the-head via ssh). Methods 3-5 would probably be
fulfilled by custom site-specific shell scripts, example snippets on a wiki, or
existing tools like the fence_* programs that are available from Red Hat.
h3. Open questions
- do we need to have any kind of framework for unfencing built in to Hadoop? Or
is it up to an administrator to "unfence"?
- is it actually a good idea to include "Cooperative shutdown" in this same
framework? or should we only call fence when we know it's uncooperative?
> HA: namenode fencing mechanism
> ------------------------------
>
> Key: HDFS-2179
> URL: https://issues.apache.org/jira/browse/HDFS-2179
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: name-node
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is
> active at a time has to be preserved in order to prevent "split brain
> syndrome." Thus, when a standby NN is transition to "active" state during a
> failover, it needs to somehow _fence_ the formerly active NN to ensure that
> it can no longer perform edits. This JIRA is to discuss and implement NN
> fencing.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira