[
https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404285#comment-13404285
]
Aaron T. Myers commented on HDFS-3561:
--------------------------------------
I think some wires are getting crossed here. Some clarifications:
* The ZKFC *always* performs the act of fencing, by executing the configured
fencing methods.
* There are two fencing methods shipped out of the box: 1) RPC to the active NN
to tell it to move to the standby state, 2) ssh to the active NN and `kill -9'
the NN process.
* You can optionally configure more fencing methods, for example IP-based
shared storage fencing, or IP-based STONITH via PDU fencing.
* The ZKFC proceeds to execute the various fencing methods in the order they're
configured.
* One of the stated aims of the HA work was to favor data reliability over
availability. So, if we can't guarantee the correctness of the data, we
shouldn't cause a state transition.
Given all of the above, at least one of the fencing methods *must* succeed
before the ZFKC can reasonably cause the standby to transition to active.
Imagine a network failure wherein the ZKFCs can no longer reach the active NN,
but can reach the standby. If we just try to ping the active NN for a while,
without having successfully fenced it, and then transition the standby to
active since we can't ping the previous active, then both NNs might be active
simultaneously, write to the shared storage and corrupt the FS metadata. This
isn't acceptable.
As I said previously, I'm very much in favor of lowering the number of graceful
fencing retries to a reasonable value. Todd recommended 0 or 1, which sounds
fine by me. What I'm not in favor of is changing the ZKFC to ever cause the
standby to become active without *some* fencing method succeeding.
> ZKFC retries for 45 times to connect to other NN during fencing when network
> between NNs broken and standby Nn will not take over as active
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-3561
> URL: https://issues.apache.org/jira/browse/HDFS-3561
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: auto-failover
> Reporter: suja s
> Assignee: Vinay
>
> Scenario:
> Active NN on machine1
> Standby NN on machine2
> Machine1 is isolated from the network (machine1 network cable unplugged)
> After zk session timeout ZKFC at machine2 side gets notification that NN1 is
> not there.
> ZKFC tries to failover NN2 as active.
> As part of this during fencing it tries to connect to machine1 and kill NN1.
> (sshfence technique configured)
> This connection retry happens for 45 times( as it takes
> ipc.client.connect.max.socket.retries)
> Also after that standby NN is not able to take over as active (because of
> fencing failure).
> Suggestion: If ZKFC is not able to reach other NN for specified time/no of
> retries it can consider that NN as dead and instruct the other NN to take
> over as active as there is no chance of the other NN (NN1) retaining its
> state as active after zk session timeout when its isolated from network
> From ZKFC log:
> {noformat}
> 2012-06-21 17:46:14,378 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 22 time(s).
> 2012-06-21 17:46:35,378 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 23 time(s).
> 2012-06-21 17:46:56,378 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 24 time(s).
> 2012-06-21 17:47:17,378 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 25 time(s).
> 2012-06-21 17:47:38,382 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 26 time(s).
> 2012-06-21 17:47:59,382 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 27 time(s).
> 2012-06-21 17:48:20,386 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 28 time(s).
> 2012-06-21 17:48:41,386 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 29 time(s).
> 2012-06-21 17:49:02,386 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 30 time(s).
> 2012-06-21 17:49:23,386 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 31 time(s).
> {noformat}
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira