[ 
https://issues.apache.org/jira/browse/HDFS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444505#comment-13444505
 ] 

Todd Lipcon commented on HDFS-3867:
-----------------------------------

I've been thinking about how to attack this. I have two options which I think 
are viable:

*Option 1: "eager" rolling*

Whenever the JN receives an request which is "out of sync" (eg some txns were 
skipped, or it restarted in the middle of a segment), it should respond with a 
special exception. The client side then catches this exception and sets a flag 
indicating that the JN is out of sync and needs a log roll in order to re-join.

In the NN code, we would then add another call to JournalManager such as 
{{isAutoLogRollNeeded()}}. After every transaction, we check this flag, and if 
set, then we trigger a log roll so that the out-of-sync JN gets picked up again 
in the next segment.

The downside of this approach is that we also need to periodically check this 
flag even if edits are not getting written. Otherwise, a rolling restart while 
the NN had no traffic would not "notice" the issue until the next attempt to 
commit, at which point all loggers would fail.

The upside of this is that the above "heartbeat" to the loggers is probably 
necessary anyway in order to have some semblance of "read fencing" (ie ensure 
that a NN doesn't continue to serve reads for arbitrarily long after a fail 
over has occurred)

----
*Option 2: "recovery after failed quorum"*

Change the error handling path on the client to support a "self recovery" code 
path. Currently, we only initiate the QJM recovery protocol on startup, but we 
could instead change it so that, if any quorum call fails, it tries the 
recovery code path at that point. So, if a quorum of nodes have gotten out of 
sync, it would initiate recovery, thus closing the currently open log segment, 
and then start a new segment from that point forward.

The advantage of this is that it's a bit more general, and could be enhanced 
with a configuration to allow it to keep retrying recovery for some number of 
seconds before bailing, thus allowing an NN to ride over some period of network 
partition, etc. The downside is that it might end up making a "dueling NNs" 
situation in an actual failover scenario, which would be no good.

My hunch is that option 1 is the better bet, and it doesn't preclude 
implementing something like option 2 in the future. Thoughts?
                
> QJM: Support rolling restart of JNs
> -----------------------------------
>
>                 Key: HDFS-3867
>                 URL: https://issues.apache.org/jira/browse/HDFS-3867
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> In order to perform upgrades or other maintenance, it is useful to be able to 
> perform a rolling restart of the journal nodes while the NameNode is active.
> Currently, this does not work, because the NN only picks up restarted JNs 
> again on the beginning of the next log segment. So, if the NN does not roll 
> after each node is restarted in turn, the NN will eventually fail to commit 
> to a quorum and crash.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to