[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

Ayush Saxena (Jira) Mon, 21 Sep 2020 21:05:46 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199786#comment-17199786
 ]


Ayush Saxena commented on HDFS-15589:
-------------------------------------

Yeps, that is true. For this only there was a proposal earlier, that block 
reports can be triggered after failover, but that could't reach conclusion, 
Since the number of datanodes in actual production will be quite high, and it 
could increase load on Namenode.

If you are facing this trouble, you can trigger Block report explicitly using 
{{dfsadmin}}
or do you propose any solution to this?

> Huge PostponedMisreplicatedBlocks can't decrease immediately when start 
> namenode after datanode
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15589
>                 URL: https://issues.apache.org/jira/browse/HDFS-15589
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>         Environment: CentOS 7
>            Reporter: zhengchenyu
>            Priority: Major
>
> In our test cluster, I restart my namenode. Then I found many 
> PostponedMisreplicatedBlocks which doesn't decrease immediately. 
> I search the log below like this. 
> {code:java}
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> 2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport: 
> from DatanodeRegistration(xx.xx.xx.xx:9866, 
> datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
>  reports.length=12
> {code}
> Node: test cluster only have 6 datanode.
> You will see the blockreport called before "Marking all datanodes as stale" 
> which is logged by startActiveServices. But 
> DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then 
> startActiveServices set all datnaode to stale node. So the datanodes will 
> keep stale util next blockreport, then PostponedMisreplicatedBlocks keep a 
> huge number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15589) Huge PostponedMisreplicatedBlocks can't decrease immediately when start namenode after datanode

Reply via email to