[
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097741#comment-17097741
]
Konstantin Shvachko commented on HDFS-15323:
--------------------------------------------
Took some time to figure out the problem.
# The main problem is in {{EditLogTailer.catchupDuringFailover()}}, which is
called by {{startActiveServices()}} during failover.
{{catchupDuringFailover()}} runs {{doTailEdits()}} once and returns. While
{{doTailEdits()}} is supposed to bring one portion of edits and return. Getting
one portion of edits for {{doTailEdits()}} is fine for periodic tailing, since
on the next iteration it will get the next portion, and then the next.
But it is not enough for {{catchupDuringFailover()}}, because catching up
should fully read all transactions up to the last state of the previous Active
NN. But it reads only up to a fixed number of edits ({{QJM_RPC_MAX_TXNS}}). So
if StandbyNode falls far behind then it may not fully catch up with only one
portion of edits.
# StandbyNode can fall far behind during checkpoint process. It does
{{saveNamespace()}}, which hold the namesystem lock and during this time
{{EditLogTailer}} cannot apply transactions to the namespace. I compared
transaction ids on StandbyNode and Active when standby was creating a
checkpoint. Standby was ~1M transaction behind at some point, while between
checkpoints the lag is around 100 tx. So if you failover during this time, the
checkpoint thread is interrupted, image saving is cancelled, but catching up is
not done properly.
You need a fairly large image to hit this error.
We caught this problem in our 2.10 branch, but it exists in all HDFS versions.
Attaching a unit test to reproduce the problem.
> StandbyNode fails transition to active due to insufficient transaction tailing
> ------------------------------------------------------------------------------
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode, qjm
> Affects Versions: 2.7.7
> Reporter: Konstantin Shvachko
> Priority: Major
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind
> in tailing journal transaction (from QJM) it can crash with
> {{IllegalStateException}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]