[ 
https://issues.apache.org/jira/browse/HDFS-10365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271381#comment-15271381
 ] 

Chackaravarthy commented on HDFS-10365:
---------------------------------------

Hi [~kihwal], Thanks for the pointers. In our cluster, NN is configured with 
G1. 
In our case also, NN comes out of safe mode in 15 or 20 mins. But still it is 
flooded with FBR from DN's as all FirstFBR gets timed out and NN gets error 
only while sending output (but updates its state and comes out of safe mode).
{quote}
Have datanodes break up full block reports by storage. This makes each FBR RPC 
smaller, so the impact of timeout-retransmit can be lower.
{quote}
Are you suggesting to tune {{dfs.blockreport.split.threashold}} to make DN to 
send FBR per storage? currently average total blocks per DN is 200k around. So 
if I reduce {{dfs.blockreport.split.threashold}} from 1Million (default) to 
100k or 150k, then this would make FBR RPC smaller. Is this what you meant?

> FullBlockReports retransmission delays NN startup time in large cluster.
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10365
>                 URL: https://issues.apache.org/jira/browse/HDFS-10365
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.6.0
>         Environment: version - hadoop-2.6.0 (hdp-2.2)
> DN - 1200 nodes
>            Reporter: Chackaravarthy
>            Priority: Critical
>
> Whenever NN is restarted, it takes huge time for NN to come back to stable 
> state. i.e. Last contact time remains more than 1 or 2 mins continuously for 
> around 3 to 4 hours. This is mainly because most of the DN's getting timeout 
> (60s) in blockReport (FBR) rpc call and then it keep sending FBR again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to