[
https://issues.apache.org/jira/browse/HDFS-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271330#comment-15271330
]
Chackaravarthy commented on HDFS-10365:
---------------------------------------
Thanks [~arpitagarwal] for the suggestion. Sure, will increase the
{{dfs.blockreport.initialDelay}} and try it. Do you suggest to decrease
{{dfs.namenode.service.handler.count}} from 600 (1200 node cluster)? Because
other than heartbeat, the most frequent service RPC call will be IBR as there
could be multiple IBR's between 2 successive heart beat (interval set to 10s).
IBR also needs write lock and hence not sure whether 600 handler count really
helps here or not.
> FullBlockReports retransmission delays NN startup time in large cluster.
> ------------------------------------------------------------------------
>
> Key: HDFS-10365
> URL: https://issues.apache.org/jira/browse/HDFS-10365
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 2.6.0
> Environment: version - hadoop-2.6.0 (hdp-2.2)
> DN - 1200 nodes
> Reporter: Chackaravarthy
> Priority: Critical
>
> Whenever NN is restarted, it takes huge time for NN to come back to stable
> state. i.e. Last contact time remains more than 1 or 2 mins continuously for
> around 3 to 4 hours. This is mainly because most of the DN's getting timeout
> (60s) in blockReport (FBR) rpc call and then it keep sending FBR again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]