[jira] [Updated] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

Zhe Zhang (JIRA) Mon, 12 Oct 2015 08:27:22 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Zhe Zhang updated HDFS-9119:
----------------------------
    Attachment: HDFS-9119.00.patch

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-9119
>                 URL: https://issues.apache.org/jira/browse/HDFS-9119
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.7.1
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-9119.00.patch
>
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

Reply via email to