[jira] [Updated] (HDDS-15403) Build a second EC reconstruction procedure to target faster time to recovery

Ryan Blough (Jira) Thu, 25 Jun 2026 13:11:04 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ryan Blough updated HDDS-15403:
-------------------------------
    Description: 
This Jira is to track and implement a different erasure coding reconstruction 
procedure with a focus on completing the reconstruction task faster in exchange 
for a bigger resource footprint.

The EC reconstruction procedure we currently use is sequential, by stripe, and 
has a minimal resource footprint. The two loops of the reconstruction are:

1. Gather the data from the DNs. This is done concurrently for one block group, 
but fetches data per-stripe to load into off-heap memory.

2. Do reconstruction per stripe:
a. Reconstruct one stripe into memory.
b. Write that stripe over the network to a target node.
c. On confirmation of write, move to the next stripe.

This has the advantage of consuming minimal resources at each step, with memory 
footprint being limited to one container chunk size, and consuming a single 
thread.

However, it is also has network at either end of a loop that iterates many 
times.

The concept of this second reconstruction method is to complete all steps in a 
single stage each without looping. The tradeoff will be faster time to recovery 
for the individual container in exchange for a larger resource footprint 
(namely enough memory to store the full-size container).

After a first pass to establish end-to-end single-threaded behavior in 
comparison with the loop, additional considerations are likely to include 
async, multithreading, and revisiting the single-chunk work unit depending on 
how conventional Reed-Solomon (the algorithm in libhadoop) scales with work 
unit size.

  was:
This Jira is to track and implement a different erasure coding reconstruction 
procedure with a focus on completing the reconstruction task faster in exchange 
for a bigger resource footprint.

The EC reconstruction procedure we currently use is sequential, by stripe, and 
has a minimal resource footprint. The two loops of the reconstruction are:

1. Gather the data from the DNs. This is done concurrently for one block group, 
but fetches data per-stripe to load into off-heap memory.

2. Do reconstruction per stripe:
a. Reconstruct one stripe into memory.
b. Write that stripe over the network to a target node.
c. On confirmation of write, move to the next stripe.

This has the advantage of consuming minimal resources at each step, with memory 
footprint being limited to one container chunk size, and consuming a single 
thread.

However, it is also has network at either end of a loop that iterates many 
times.

The concept of this second reconstruction method is to complete steps 1-4 in a 
single stage each. The tradeoff will be faster time to recovery for the 
individual container in exchange for a larger resource footprint (namely enough 
memory to store the full-size container).

After a first pass to establish end-to-end single-threaded behavior in 
comparison with the loop, additional considerations are likely to include 
async, multithreading, and revisiting the single-chunk work unit depending on 
how conventional Reed-Solomon (the algorithm in libhadoop) scales with work 
unit size.


> Build a second EC reconstruction procedure to target faster time to recovery
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-15403
>                 URL: https://issues.apache.org/jira/browse/HDDS-15403
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: EC, Ozone Datanode
>            Reporter: Ryan Blough
>            Assignee: Ryan Blough
>            Priority: Major
>
> This Jira is to track and implement a different erasure coding reconstruction 
> procedure with a focus on completing the reconstruction task faster in 
> exchange for a bigger resource footprint.
> The EC reconstruction procedure we currently use is sequential, by stripe, 
> and has a minimal resource footprint. The two loops of the reconstruction are:
> 1. Gather the data from the DNs. This is done concurrently for one block 
> group, but fetches data per-stripe to load into off-heap memory.
> 2. Do reconstruction per stripe:
> a. Reconstruct one stripe into memory.
> b. Write that stripe over the network to a target node.
> c. On confirmation of write, move to the next stripe.
> This has the advantage of consuming minimal resources at each step, with 
> memory footprint being limited to one container chunk size, and consuming a 
> single thread.
> However, it is also has network at either end of a loop that iterates many 
> times.
> The concept of this second reconstruction method is to complete all steps in 
> a single stage each without looping. The tradeoff will be faster time to 
> recovery for the individual container in exchange for a larger resource 
> footprint (namely enough memory to store the full-size container).
> After a first pass to establish end-to-end single-threaded behavior in 
> comparison with the loop, additional considerations are likely to include 
> async, multithreading, and revisiting the single-chunk work unit depending on 
> how conventional Reed-Solomon (the algorithm in libhadoop) scales with work 
> unit size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-15403) Build a second EC reconstruction procedure to target faster time to recovery

Reply via email to