[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

zhengchenyu (Jira) Thu, 25 Mar 2021 21:18:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290727#comment-17290727
 ]


zhengchenyu edited comment on HDFS-15715 at 3/26/21, 4:17 AM:
--------------------------------------------------------------

[~hexiaoqiao]

Yeah, no problem. 

Note: I found this problem in cluster which version is hadoop-2.7.3, but all 
version may trigger this bug. So I submit a patch base on trunk.

1. How to found it ?

Due to limited length and my poor English, I will describe the analysis 
procedure simply.

(a) demmission is very slow

  !image-2021-03-26-12-17-45-500.png! 
When do datanode demission,  UnderReplicatedBlocks keep high, 
PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is 
heavy. 

 

(b) strange Log from NameNode

We could guess that some code in choosTarget may not be rational. 
{code:java}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy

{code}
 

(c) many stack info statistical

By many stack info statistical, I Found hot code in below jstack 
{code:java}
"org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844"
 #34 daemon prio=5 os_prio=0 tid=0x00007f772e03a800 nid=0x6288f runnable 
[0x00007f4507c0f000]
 java.lang.Thread.State: RUNNABLE
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556)
 at 
org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808)
 at 
org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293)
 at java.lang.Thread.run(Thread.java:748)

{code}
(d) continue to enable debug log

After enable some debug log, print "is not chosen since the rack has too many 
chosen nodes" frequently. And the total number of this log are close to 
cluster's DataNodeStorage number. We could guess hit rate of  choosTagert is 
very slow. 

Then I use unit-test to reproduce this problem. 

 

2. How to repair this problem ?

I have reproduced this case in trunk branch.

I submit HDFS-15715.002.patch, in this patch, I doesn't repair the bug, The 
TestReplicationPolicyWithMultiStorage could reproduce the bug. When this bug 
was triggered, will print many logs like "is not chosen since the rack has too 
many chosen nodes."

Then apply HDFS-15715.002.patch.addendum, this bug fix. The 
UnderReplicatedBlocks decline normally.

 

 


was (Author: zhengchenyu):
[~hexiaoqiao]

Yeah, no problem. 

Note: I found this problem in cluster which version is hadoop-2.7.3, but all 
version may trigger this bug. So I submit a patch base on trunk.

1. How to found it ?

Due to limited length and my poor English, I will describe the analysis 
procedure simply.

(a) demmission is very slow

 

!image-2021-02-25-14-41-49-394.png|width=378,height=155!

When do datanode demission,  UnderReplicatedBlocks keep high, 
PendingDeletionBlocks decline slowly. We could speculate replicationMonitor is 
heavy. 

 

(b) strange Log from NameNode

We could guess that some code in choosTarget may not be rational. 
{code:java}
2020-12-04 12:13:56,345 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{HOT:7, storageTypes=[DISK], 
creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
2020-12-04 12:14:03,843 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
storagePolicy=BlockStoragePolicy\{COLD:2, storageTypes=[ARCHIVE], 
creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
information, please enable DEBUG log level on 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy

{code}
 

(c) many stack info statistical

By many stack info statistical, I Found hot code in below jstack 
{code:java}
"org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@64a8c844"
 #34 daemon prio=5 os_prio=0 tid=0x00007f772e03a800 nid=0x6288f runnable 
[0x00007f4507c0f000]
 java.lang.Thread.State: RUNNABLE
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at 
org.apache.hadoop.net.NetworkTopology$InnerNode.getLoc(NetworkTopology.java:296)
 at org.apache.hadoop.net.NetworkTopology.getNode(NetworkTopology.java:556)
 at 
org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:808)
 at 
org.apache.hadoop.net.NetworkTopologyWithMultiDC.countNumOfAvailableNodes(NetworkTopologyWithMultiDC.java:259)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseRandom(BlockPlacementPolicyDefaultWithMultiDC.java:803)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:473)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:300)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefaultWithMultiDC.chooseTarget(BlockPlacementPolicyDefaultWithMultiDC.java:177)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWorkWithMultiDC.chooseTargets(BlockManager.java:4448)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocksWithMultiDC(BlockManager.java:1740)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1419)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4341)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4293)
 at java.lang.Thread.run(Thread.java:748)

{code}
(d) continue to enable debug log

After enable some debug log, print "is not chosen since the rack has too many 
chosen nodes" frequently. And the total number of this log are close to 
cluster's DataNodeStorage number. We could guess hit rate of  choosTagert is 
very slow. 

Then I use unit-test to reproduce this problem. 

 

2. How to repair this problem ?

I have reproduced this case in trunk branch.

I submit HDFS-15715.002.patch, in this patch, I doesn't repair the bug, The 
TestReplicationPolicyWithMultiStorage could reproduce the bug. When this bug 
was triggered, will print many logs like "is not chosen since the rack has too 
many chosen nodes."

Then apply HDFS-15715.002.patch.addendum, this bug fix. The 
UnderReplicatedBlocks decline normally.

 

 

> ReplicatorMonitor performance degrades, when the storagePolicy of many file 
> are not match with their real datanodestorage 
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-15715
>                 URL: https://issues.apache.org/jira/browse/HDFS-15715
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.7.3, 3.2.1
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>             Fix For: 3.3.1
>
>         Attachments: HDFS-15715.001.patch, HDFS-15715.002.patch, 
> HDFS-15715.002.patch.addendum, image-2021-03-26-12-17-45-500.png
>
>
> One of our Namenode which has 300M files and blocks. In common way, this 
> namode shoud not be in heavy load. But we found rpc process time keep high, 
> and decommission is very slow.
>  
> I search the metrics, I found uderreplicated blocks keep high. Then I jstack 
> namenode, found 'InnerNode.getLoc' is hot spot cod. I think maybe 
> chooseTarget can't find block, so result to performance degradation. Consider 
> with HDFS-10453, I guess maybe some logical trigger to the scene where 
> chooseTarget can't find proper block.
> Then I enable some debug. (Of course I revise some code so that only debug 
> isGoodTarget, because enable BlockPlacementPolicy's debug log is dangrouse). 
> I found "the rack has too many chosen nodes" is called. Then I found some log 
> like this 
> {code}
> 2020-12-04 12:13:56,345 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], 
> creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For 
> more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2020-12-04 12:14:03,843 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], 
> storagePolicy=BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], 
> creationFallbacks=[], replicationFallbacks=[]}, newBlock=false) For more 
> information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> {code} 
> Then through some debug and simulation, I found the reason, and reproduction 
> this exception.
> The reason is that some developer use COLD storage policy and mover, but the 
> operatiosn of setting storage policy and mover are asynchronous. So some 
> file's real  datanodestorages are not match with this storagePolicy.
> Let me simualte this proccess. If /tmp/a is create, then have 2 replications 
> are DISK. Then set storage policy to COLD. When some logical trigger(For 
> example decommission) to copy this block. chooseTarget then use 
> chooseStorageTypes to filter real needed block. Here the size of variable 
> requiredStorageTypes which chooseStorageTypes returned is 3. But the size of  
> result is 2. But 3 means need 3 ARCHIVE storage. 2 means bocks has 2 DISK 
> storage. Then will request to choose 3 target. choose first target is right, 
> but when choose seconde target, the variable 'counter' is 4 which is larger 
> than maxTargetPerRack which is 3 in function isGoodTarget. So skip all 
> datanodestorage. Then result to bad performance.
> I think chooseStorageTypes need to consider the result, when the exist 
> replication doesn't meet storage policy's demand, we need to remove this from 
> result. 
> I changed by this way, and test in my unit-test. Then solve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-15715) ReplicatorMonitor performance degrades, when the storagePolicy of many file are not match with their real datanodestorage

Reply via email to