[ 
https://issues.apache.org/jira/browse/HDFS-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated HDFS-15901:
--------------------------------
    Description: 
When the cluster exceeds thousands of nodes, we want to restart the NameNode 
service, and all DataNodes send a full Block action to the NameNode. During 
SafeMode, some DataNodes may send blocks to NameNode multiple times, which will 
take up too much RPC. In fact, this is unnecessary.
In this case, some block report leases will fail or time out, and in extreme 
cases, the NameNode will always stay in Safe Mode.

2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
processor:BlockManager@2158] - BLOCK* processReport 0xexxxxxxxx: discarded 
non-initial block report from DatanodeRegistration(xxxxxxxx:port, 
datanodeUuid=xxxxxxxx, infoPort=xxxxxxxx, infoSecurePort=xxxxxxxx, 
ipcPort=xxxxxxxx, storageInfo=lv=xxxxxxxx;nsid=xxxxxxxx;c=0) because namenode 
still in startup phase
2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
processor:BlockManager@2158] - BLOCK* processReport 0xexxxxxxxx: discarded 
non-initial block report from DatanodeRegistration(xxxxxxxx, 
datanodeUuid=xxxxxxxx, infoPort=xxxxxxxx, infoSecurePort=xxxxxxxx, 
ipcPort=xxxxxxxx, storageInfo=lv=xxxxxxxx;nsid=xxxxxxxx;c=0) because namenode 
still in startup phase

2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
processor:BlockReportLeaseManager@311] - BR lease 0xxxxxxxxx is not valid for 
DN xxxxxxxx, because the DN is not in the pending set.
2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
processor:BlockReportLeaseManager@311] - BR lease 0xxxxxxxxx is not valid for 
DN xxxxxxxx, because the DN is not in the pending set.
2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
processor:BlockReportLeaseManager@317] - BR lease 0xxxxxxxxx is not valid for 
DN xxxxxxxx, because the lease has expired.
2021-03-13 18:36:08,145 [29220972] - WARN  [Block report 
processor:BlockReportLeaseManager@317] - BR lease 0xxxxxxxxx is not valid for 
DN xxxxxxxx, because the lease has expired.

  was:
When the cluster exceeds thousands of nodes, we want to restart the NameNode 
service, and all DataNodes send a full Block action to the NameNode. During 
SafeMode, some DataNodes may send blocks to NameNode multiple times, which will 
take up too much RPC. In fact, this is unnecessary.
In this case, some block report leases will fail or time out, and in extreme 
cases, the NameNode will always stay in Safe Mode.


> Solve the problem of DN repeated block reports occupying too many RPCs during 
> Safemode
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-15901
>                 URL: https://issues.apache.org/jira/browse/HDFS-15901
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>
> When the cluster exceeds thousands of nodes, we want to restart the NameNode 
> service, and all DataNodes send a full Block action to the NameNode. During 
> SafeMode, some DataNodes may send blocks to NameNode multiple times, which 
> will take up too much RPC. In fact, this is unnecessary.
> In this case, some block report leases will fail or time out, and in extreme 
> cases, the NameNode will always stay in Safe Mode.
> 2021-03-14 08:16:25,873 [78438700] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xexxxxxxxx: discarded 
> non-initial block report from DatanodeRegistration(xxxxxxxx:port, 
> datanodeUuid=xxxxxxxx, infoPort=xxxxxxxx, infoSecurePort=xxxxxxxx, 
> ipcPort=xxxxxxxx, storageInfo=lv=xxxxxxxx;nsid=xxxxxxxx;c=0) because namenode 
> still in startup phase
> 2021-03-14 08:16:31,521 [78444348] - INFO  [Block report 
> processor:BlockManager@2158] - BLOCK* processReport 0xexxxxxxxx: discarded 
> non-initial block report from DatanodeRegistration(xxxxxxxx, 
> datanodeUuid=xxxxxxxx, infoPort=xxxxxxxx, infoSecurePort=xxxxxxxx, 
> ipcPort=xxxxxxxx, storageInfo=lv=xxxxxxxx;nsid=xxxxxxxx;c=0) because namenode 
> still in startup phase
> 2021-03-13 18:35:38,200 [29191027] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0xxxxxxxxx is not valid for 
> DN xxxxxxxx, because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@311] - BR lease 0xxxxxxxxx is not valid for 
> DN xxxxxxxx, because the DN is not in the pending set.
> 2021-03-13 18:36:08,143 [29220970] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0xxxxxxxxx is not valid for 
> DN xxxxxxxx, because the lease has expired.
> 2021-03-13 18:36:08,145 [29220972] - WARN  [Block report 
> processor:BlockReportLeaseManager@317] - BR lease 0xxxxxxxxx is not valid for 
> DN xxxxxxxx, because the lease has expired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to