[jira] [Assigned] (HDDS-15403) Build a second EC reconstruction procedure to target faster time to recovery

Ryan Blough (Jira) Wed, 27 May 2026 17:18:07 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ryan Blough reassigned HDDS-15403:
----------------------------------

    Assignee: Ryan Blough

> Build a second EC reconstruction procedure to target faster time to recovery
> ----------------------------------------------------------------------------
>
>                 Key: HDDS-15403
>                 URL: https://issues.apache.org/jira/browse/HDDS-15403
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: EC, Ozone Datanode
>            Reporter: Ryan Blough
>            Assignee: Ryan Blough
>            Priority: Major
>
> This Jira is to track and implement a different erasure coding reconstruction 
> procedure with a focus on completing the reconstruction task faster.
> The EC reconstruction procedure we currently use is sequential, by chunk, and 
> has a minimal resource footprint. The core loop of the reconstruction is:
>  # Fetch a container chunk over the network.
>  # Load the chunk into off-heap memory.
>  # Do reconstruction on the chunk.
>  # Write that chunk over the network to the target nodes.
>  # On confirmation of write, iterate the loop.
> This has the advantage of consuming minimal resources at each step, with 
> memory footprint being limited to one container chunk size, and consuming a 
> single thread.
> However, it is also has network at either end of a loop that iterates many 
> times.
> The concept of this second reconstruction method is to complete steps 1-4 in 
> a single stage each. The tradeoff will be faster time to recovery for the 
> individual container in exchange for a larger resource footprint (namely 
> enough memory to store the full-size container).
> After a first pass to establish end-to-end single-threaded behavior in 
> comparison with the loop, additional considerations are likely to include 
> async, multithreading, and revisiting the single-chunk work unit depending on 
> how conventional Reed-Solomon (the algorithm in libhadoop) scales with work 
> unit size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (HDDS-15403) Build a second EC reconstruction procedure to target faster time to recovery

Reply via email to