zhengchenyu created HDFS-15589:
----------------------------------
Summary: Huge PostponedMisreplicatedBlocks can't decrease
immediately when start namenode after datanode
Key: HDFS-15589
URL: https://issues.apache.org/jira/browse/HDFS-15589
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs
Environment: CentOS 7
Reporter: zhengchenyu
In our test cluster, I restart my namenode. Then I found many
PostponedMisreplicatedBlocks which doesn't decrease immediately.
I search the log below like this.
{code}
2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport:
from DatanodeRegistration(xx.xx.xx.xx:9866,
datanodeUuid=c6a9934f-afd4-4437-b976-fed55173ce57, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
reports.length=12
2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport:
from DatanodeRegistration(xx.xx.xx.xx:9866,
datanodeUuid=aee144f1-2082-4bca-a92b-f3c154a71c65, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
reports.length=12
2020-09-21 17:02:37,029 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport:
from DatanodeRegistration(xx.xx.xx.xx:9866,
datanodeUuid=d152fa5b-1089-4bfc-b9c4-e3a7d98c7a7b, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
reports.length=12
2020-09-21 17:02:37,156 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport:
from DatanodeRegistration(xx.xx.xx.xx:9866,
datanodeUuid=5cffc1fe-ace9-4af8-adfc-6002a7f5565d, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
reports.length=12
2020-09-21 17:02:37,161 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport:
from DatanodeRegistration(xx.xx.xx.xx:9866,
datanodeUuid=9980d8e1-b0d9-4657-b97d-c803f82c1459, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
reports.length=12
2020-09-21 17:02:37,197 DEBUG BlockStateChange: *BLOCK* NameNode.blockReport:
from DatanodeRegistration(xx.xx.xx.xx:9866,
datanodeUuid=77ff3f5e-37f0-405f-a16c-166311546cae, infoPort=9864,
infoSecurePort=0, ipcPort=9867,
storageInfo=lv=-57;cid=CID-9f6d0a32-e51c-459a-9f65-6e7b5791ee25;nsid=1016509846;c=1592578350834),
reports.length=12
{code}
Node: test cluster only have 6 datanode.
You will see the blockreport called before "Marking all datanodes as stale"
which is logged by startActiveServices. But
DatanodeStorageInfo.blockContentsStale only set to false in blockreport, then
startActiveServices set all datnaode to stale node. So the datanodes will keep
stale util next blockreport.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]