[ 
https://issues.apache.org/jira/browse/HDFS-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068700#comment-13068700
 ] 

Todd Lipcon commented on HDFS-2179:
-----------------------------------


h3. Fencing overview

In order to fence a NN, there are several different methods, at varying levels 
of nastiness:

1) Cooperative active->standby transition or shutdown
In the case of a manual failover, the old primary can gracefully either 
transition to a standby mode, or gracefully shut down. In this case, since we 
assume the software to be cooperative, no real "fencing" is necessary -- the 
new NN just needs to unambiguously confirm that the old NN has dropped out of 
active mode.

This method succeeds only if the old NN remains in full operation.

2) Process killing or death verification (eg via ssh or a second daemon)
In the case that the old primary has either hung (eg deadlock) or crashed (eg 
JVM segfault), but the host is OK, the new primary may contact that host and 
send SIGKILL to the NameNode JVM. This may be done either via ssh or via 
contacting some process which is still running on the node. It is also 
sufficient to _verify_ that the NN process is no longer running in the case 
that its JVM crashed.

This method succeeds only if the _host_ of the old NN remains in full 
operation, despite the NN itself being deadlocked or crashed.

3) Storage fencing
Depending on the type of storage in which the old NN stores its edits 
directories, the new NN may explicitly fence the storage. This is typically 
accomplished using a vendor-specific extension. For example, NetApp filers 
support the command "exportfs -b enable save <nnhost.com> /vol/vol0" which can 
be remotely issued in order to disallow any further access to a particular 
mount by a particular host.

In the case of edits stored on BookKeeper in the future, we may be able to 
implement some kind of lease revocation or fencing within that storage system.

4) Network port fencing
Many switches support remote management. One way to prevent a NameNode from 
responding to any further requests is to forcibly disable its network port. An 
alternative similar mechanism is to use something like a LOM card to remotely 
disable the NIC.

5) Power port fencing (aka STONITH)
Many power distribution units (PDUs) support remote management. The last ditch 
effort to fence a node is to literally "pull the power"


h3. Proposal

Since methods 3-5 above are usually vendor-specific implementations, it does 
not make sense to try to implement a catch-all fencing mechanism within Hadoop. 
Instead, operators are likely to want to use commonly available shell scripts 
that work against their preferred hardware. Given this, I would propose that 
Hadoop's fencing behavior be:

- Configure a list of "fence methods", each with an associated priority.
- Each fence method returns an exit code indicating whether it has successfully 
fenced the target node.
- If any method succeeds, no further method is attempted.
- If a method fails, continue down the list to try the next method.
- If all fence methods fail, then both nodes remain in "standby" state, and an 
administrator must manually force the transition after verifying that the other 
node is no longer active.

The first fence method will always be the "cooperative" method. We can also 
ship with Hadoop an implementation of method #2 
(shoot-the-other-process-in-the-head via ssh). Methods 3-5 would probably be 
fulfilled by custom site-specific shell scripts, example snippets on a wiki, or 
existing tools like the fence_* programs that are available from Red Hat.

h3. Open questions
- do we need to have any kind of framework for unfencing built in to Hadoop? Or 
is it up to an administrator to "unfence"?
- is it actually a good idea to include "Cooperative shutdown" in this same 
framework? or should we only call fence when we know it's uncooperative?


> HA: namenode fencing mechanism
> ------------------------------
>
>                 Key: HDFS-2179
>                 URL: https://issues.apache.org/jira/browse/HDFS-2179
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> In an HA cluster, when there are two NNs, the invariant that only one NN is 
> active at a time has to be preserved in order to prevent "split brain 
> syndrome." Thus, when a standby NN is transition to "active" state during a 
> failover, it needs to somehow _fence_ the formerly active NN to ensure that 
> it can no longer perform edits. This JIRA is to discuss and implement NN 
> fencing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to