[jira] [Commented] (HDFS-17515) Erasure Coding: ErasureCodingWork is not effectively limited during a block reconstruction cycle.

ASF GitHub Bot (Jira) Wed, 08 May 2024 23:44:09 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844880#comment-17844880
 ]


ASF GitHub Bot commented on HDFS-17515:
---------------------------------------

zhengchenyu opened a new pull request, #6805:
URL: https://github.com/apache/hadoop/pull/6805

   ### Description of PR
   
   https://issues.apache.org/jira/browse/HDFS-17515
   
   ### How was this patch tested?
   
   unit test
   
   ### For code changes:
   
   - Add pendingECBlockReplicationWithoutTargets for DatanodeDescriptor
   - When construct ErasureCodingWork, increase 
pendingECBlockReplicationWithoutTargets. When set targets, decrease 
pendingECBlockReplicationWithoutTargets.
   - Because we can not decide the real source datanode when 
scheduleReconstruction, so update pendingECBlockReplicationWithoutTargets for 
all source datanode. So when we calculate getNumberOfBlocksToBeReplicated, use 
a factor to appropriately lower the value.
   
   




> Erasure Coding: ErasureCodingWork is not effectively limited during a block 
> reconstruction cycle.
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17515
>                 URL: https://issues.apache.org/jira/browse/HDFS-17515
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Chenyu Zheng
>            Assignee: Chenyu Zheng
>            Priority: Major
>
> In a block reconstruction cycle, ErasureCodingWork is not effectively 
> limited. I add some debug log, log when ecBlocksToBeReplicated is an integer 
> multiple of 100.
> {code:java}
> 2024-05-09 10:46:06,986 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 100 blocks
> 2024-05-09 10:46:06,987 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 200 blocks
> ...
> 2024-05-09 10:46:06,992 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 2000 blocks
> 2024-05-09 10:46:06,992 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 2100 blocks {code}
> During a block reconstruction cycle, ecBlocksToBeReplicated increases from 0 
> to 2100, This is much larger than replicationStreamsHardLimit. This brings 
> unfairness and leads to a greater tendency to copy EC blocks.
> In fact, for non ec block, this is not a problem. 
> pendingReplicationWithoutTargets increase when schedule work. When 
> pendingReplicationWithoutTargets is too big, will not schedule work for this 
> node.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17515) Erasure Coding: ErasureCodingWork is not effectively limited during a block reconstruction cycle.

Reply via email to