[jira] [Commented] (HDFS-10365) FullBlockReports retransmission delays NN startup time in large cluster.

Chackaravarthy (JIRA) Wed, 04 May 2016 12:20:32 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271252#comment-15271252
 ]


Chackaravarthy commented on HDFS-10365:
---------------------------------------

Thanks [~cnauroth] for the response. These fixes seems relevant to resolve the 
issue which we are facing currently. We will see if we can backport these fixes.

As a quick fix to handle in 2.6.0, do you think this can be solved by tuning 
any config? And is there any guideline to set service handler count depending 
upon cluster size?

> FullBlockReports retransmission delays NN startup time in large cluster.
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10365
>                 URL: https://issues.apache.org/jira/browse/HDFS-10365
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.6.0
>         Environment: version - hadoop-2.6.0 (hdp-2.2)
> DN - 1200 nodes
>            Reporter: Chackaravarthy
>            Priority: Critical
>
> Whenever NN is restarted, it takes huge time for NN to come back to stable 
> state. i.e. Last contact time remains more than 1 or 2 mins continuously for 
> around 3 to 4 hours. This is mainly because most of the DN's getting timeout 
> (60s) in blockReport (FBR) rpc call and then it keep sending FBR again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10365) FullBlockReports retransmission delays NN startup time in large cluster.

Reply via email to