[ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097741#comment-17097741
 ] 

Konstantin Shvachko commented on HDFS-15323:
--------------------------------------------

Took some time to figure out the problem.
# The main problem is in {{EditLogTailer.catchupDuringFailover()}}, which is 
called by {{startActiveServices()}} during failover.
 {{catchupDuringFailover()}} runs {{doTailEdits()}} once and returns. While 
{{doTailEdits()}} is supposed to bring one portion of edits and return. Getting 
one portion of edits for {{doTailEdits()}} is fine for periodic tailing, since 
on the next iteration it will get the next portion, and then the next.
 But it is not enough for {{catchupDuringFailover()}}, because catching up 
should fully read all transactions up to the last state of the previous Active 
NN. But it reads only up to a fixed number of edits ({{QJM_RPC_MAX_TXNS}}). So 
if StandbyNode falls far behind then it may not fully catch up with only one 
portion of edits.
# StandbyNode can fall far behind during checkpoint process. It does 
{{saveNamespace()}}, which hold the namesystem lock and during this time 
{{EditLogTailer}} cannot apply transactions to the namespace. I compared 
transaction ids on StandbyNode and Active when standby was creating a 
checkpoint. Standby was ~1M transaction behind at some point, while between 
checkpoints the lag is around 100 tx. So if you failover during this time, the 
checkpoint thread is interrupted, image saving is cancelled, but catching up is 
not done properly.
You need a fairly large image to hit this error.

We caught this problem in our 2.10 branch, but it exists in all HDFS versions.
Attaching a unit test to reproduce the problem.

> StandbyNode fails transition to active due to insufficient transaction tailing
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-15323
>                 URL: https://issues.apache.org/jira/browse/HDFS-15323
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, qjm
>    Affects Versions: 2.7.7
>            Reporter: Konstantin Shvachko
>            Priority: Major
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to