[
https://issues.apache.org/jira/browse/HDFS-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444505#comment-13444505
]
Todd Lipcon commented on HDFS-3867:
-----------------------------------
I've been thinking about how to attack this. I have two options which I think
are viable:
*Option 1: "eager" rolling*
Whenever the JN receives an request which is "out of sync" (eg some txns were
skipped, or it restarted in the middle of a segment), it should respond with a
special exception. The client side then catches this exception and sets a flag
indicating that the JN is out of sync and needs a log roll in order to re-join.
In the NN code, we would then add another call to JournalManager such as
{{isAutoLogRollNeeded()}}. After every transaction, we check this flag, and if
set, then we trigger a log roll so that the out-of-sync JN gets picked up again
in the next segment.
The downside of this approach is that we also need to periodically check this
flag even if edits are not getting written. Otherwise, a rolling restart while
the NN had no traffic would not "notice" the issue until the next attempt to
commit, at which point all loggers would fail.
The upside of this is that the above "heartbeat" to the loggers is probably
necessary anyway in order to have some semblance of "read fencing" (ie ensure
that a NN doesn't continue to serve reads for arbitrarily long after a fail
over has occurred)
----
*Option 2: "recovery after failed quorum"*
Change the error handling path on the client to support a "self recovery" code
path. Currently, we only initiate the QJM recovery protocol on startup, but we
could instead change it so that, if any quorum call fails, it tries the
recovery code path at that point. So, if a quorum of nodes have gotten out of
sync, it would initiate recovery, thus closing the currently open log segment,
and then start a new segment from that point forward.
The advantage of this is that it's a bit more general, and could be enhanced
with a configuration to allow it to keep retrying recovery for some number of
seconds before bailing, thus allowing an NN to ride over some period of network
partition, etc. The downside is that it might end up making a "dueling NNs"
situation in an actual failover scenario, which would be no good.
My hunch is that option 1 is the better bet, and it doesn't preclude
implementing something like option 2 in the future. Thoughts?
> QJM: Support rolling restart of JNs
> -----------------------------------
>
> Key: HDFS-3867
> URL: https://issues.apache.org/jira/browse/HDFS-3867
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> In order to perform upgrades or other maintenance, it is useful to be able to
> perform a rolling restart of the journal nodes while the NameNode is active.
> Currently, this does not work, because the NN only picks up restarted JNs
> again on the beginning of the next log segment. So, if the NN does not roll
> after each node is restarted in turn, the NN will eventually fail to commit
> to a quorum and crash.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira