zhengchenyu opened a new pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105


   When I speed up the decommission, I found that some datanode's io is busy, 
then I found host's load is very high, and ten thousands data transfer thread 
are running.
   Then I find log like below.
   ```
   # 启动线程的日志
   2021-06-08 13:42:37,620 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
   2021-06-08 13:52:36,345 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
   2021-06-08 14:02:37,197 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
   # 发送完成的标记
   2021-06-08 13:54:08,134 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at 
bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
   2021-06-08 14:10:47,170 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at 
bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
   ```
   You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36.
   If datatranfser was not done in 10min(pending timeout + check interval), 
then next datatranfser for same block will be running. Then disk and network 
are heavy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to