[jira] [Commented] (HDFS-17542) EC: Optimize the EC block reconstruction.

ASF GitHub Bot (Jira) Wed, 31 Jul 2024 00:03:04 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869826#comment-17869826
 ]


ASF GitHub Bot commented on HDFS-17542:
---------------------------------------

zhengchenyu opened a new pull request, #6915:
URL: https://github.com/apache/hadoop/pull/6915

   ### Description of PR
   
   https://issues.apache.org/jira/browse/HDFS-17542
   
   The current reconstruction process of EC blocks is based on the original 
contiguous blocks. It is mainly implemented through the work constructed by 
computeReconstructionWorkForBlocks. It can be roughly divided into three 
processes:
   
   * scheduleReconstruction
   * chooseTargets
   * validateReconstructionWork
   
   For ordinary contiguous blocks:
   
   * (1) scheduleReconstruction
   Select srcNodes as the source of the copy block according to the status of 
each replica of the block. 
   * (2) chooseTargets
   Select the target of the copy.
   * (3) validateReconstructionWork
   Add the copy command to srcNode, srcNode receives the command through 
heartbeat, and executes the block copy from src to target.
   
   For EC blocks:
   (1) and (2) seems nearly same. However, whether to perform simple block copy 
or block reconstruction for EC blocks is determined in (3). And when some 
storage is busy, may result no work, it will lead to the problem described in 
[HDFS-17516](https://issues.apache.org/jira/browse/HDFS-17516). Even if no 
block copying or block reconstruction is generated, pendingReconstruction and 
neededReconstruction will still be updated until the block times out, which 
wastes the scheduling opportunity.
   
   Because the decision of whether to perform block copy or block 
reconstruction is made in (3), unnecessary liveBusyBlockIndices, and 
excludeReconstructedIndices are introduced. We know many bugs are related here. 
These should be avoided.
   
   ### How was this patch tested?
   
   unit test and test in cluster
   
   ### For code changes:
   
   - Move the work of deciding whether to copy or reconstruct blocks from 
validateReconstructionWork to scheduleReconstruction.
   




> EC: Optimize the EC block reconstruction.
> -----------------------------------------
>
>                 Key: HDFS-17542
>                 URL: https://issues.apache.org/jira/browse/HDFS-17542
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Chenyu Zheng
>            Assignee: Chenyu Zheng
>            Priority: Major
>              Labels: pull-request-available
>
> The current reconstruction process of EC blocks is based on the original 
> contiguous blocks. It is mainly implemented through the work constructed by 
> computeReconstructionWorkForBlocks. It can be roughly divided into three 
> processes:
>  * scheduleReconstruction
>  * chooseTargets
>  * validateReconstructionWork
> For ordinary contiguous blocks:
>  * (1) scheduleReconstruction
> Select srcNodes as the source of the copy block according to the status of 
> each replica of the block. 
>  * (2) chooseTargets
> Select the target of the copy.
>  * (3) validateReconstructionWork
> Add the copy command to srcNode, srcNode receives the command through 
> heartbeat, and executes the block copy from src to target.
> For EC blocks:
> (1) and (2) seems nearly same. However, whether to perform simple block copy 
> or block reconstruction for EC blocks is determined in (3). And when some 
> storage is busy, may result no work, it will lead to the problem described in 
> HDFS-17516. Even if no block copying or block reconstruction is generated, 
> pendingReconstruction and neededReconstruction will still be updated until 
> the block times out, which wastes the scheduling opportunity.
> Because the decision of whether to perform block copy or block reconstruction 
> is made in (3), unnecessary liveBusyBlockIndices, and 
> excludeReconstructedIndices are introduced. We know many bugs are related 
> here. These should be avoided.
> Improvements:
>  * Move the work of deciding whether to copy or reconstruct blocks from (3) 
> to (1).
> Such improvements are more conducive to implementing the explicit 
> specification of the reconstruction block index mentioned in HDFS-16874.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17542) EC: Optimize the EC block reconstruction.

Reply via email to