[jira] [Commented] (HDFS-10365) FullBlockReports retransmission delays NN startup time in large cluster.

Chackaravarthy (JIRA) Thu, 05 May 2016 00:16:25 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272001#comment-15272001
 ]


Chackaravarthy commented on HDFS-10365:
---------------------------------------

[~kihwal] These are valuable inputs for us. Thanks.
{noformat}
Yes. Each FBR rpc will be smaller, so the impact of timeout-retransmit will be 
lower. Also NN will process individual report quicker.
{noformat}
By doing so, are we not delaying the next heartbeat sent from DN too long as 
each RPC call might consume upto 60s. Or this is affordable to do since FBR 
will happen only once in 6 hours? 

> FullBlockReports retransmission delays NN startup time in large cluster.
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10365
>                 URL: https://issues.apache.org/jira/browse/HDFS-10365
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.6.0
>         Environment: version - hadoop-2.6.0 (hdp-2.2)
> DN - 1200 nodes
>            Reporter: Chackaravarthy
>            Priority: Critical
>
> Whenever NN is restarted, it takes huge time for NN to come back to stable 
> state. i.e. Last contact time remains more than 1 or 2 mins continuously for 
> around 3 to 4 hours. This is mainly because most of the DN's getting timeout 
> (60s) in blockReport (FBR) rpc call and then it keep sending FBR again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10365) FullBlockReports retransmission delays NN startup time in large cluster.

Reply via email to