[
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhe Zhang updated HDFS-9119:
----------------------------
Attachment: HDFS-9119.00.patch
> Discrepancy between edit log tailing interval and RPC timeout for
> transitionToActive
> ------------------------------------------------------------------------------------
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha
> Affects Versions: 2.7.1
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Attachments: HDFS-9119.00.patch
>
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files
> and directories), the amount of updates accumulated in the 2 mins edit log
> tailing interval is hard for the standby NameNode to catch up in the 1 min
> timeout window. If that happens, the FailoverController will timeout and give
> up trying to transition the standby to active. The old ANN will resume adding
> more edits. When the SbNN finally finishes catching up the edits and tries to
> become active, it will crash.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)