[
https://issues.apache.org/jira/browse/HDDS-15412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang reassigned HDDS-15412:
--------------------------------------
Assignee: Wei-Chiu Chuang
> [DataNode] Disk volume-specific container replication thread pool
> -----------------------------------------------------------------
>
> Key: HDDS-15412
> URL: https://issues.apache.org/jira/browse/HDDS-15412
> Project: Apache Ozone
> Issue Type: Task
> Components: EC, Ozone Datanode
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Major
> Attachments: Screenshot 2026-05-26 at 11.30.41 PM.png, Screenshot
> 2026-05-26 at 11.31.00 PM.png, Screenshot 2026-05-27 at 6.57.48 AM.png
>
>
> We noticed a pattern where during EC decommission, the replication starts
> fast and every disk runs at full speed; however it doesn't last. After a
> while, the overall replication gradually slows until only one disk runs at
> full speed.
> !Screenshot 2026-05-26 at 11.30.41 PM.png|width=100%!
> !Screenshot 2026-05-26 at 11.31.00 PM.png|width=100%!
> !Screenshot 2026-05-27 at 6.57.48 AM.png|width=100%!
> It turns out that when a disk is assigned multiple replication tasks
> concurrently, the tasks slows down, delaying other replication tasks even
> though they are assigned to different disks. Eventually, the rest of disks
> become idle while that particular disk is full of replication tasks.
> Proposal: Building on top of HDDS-15073, we need to create separate thread
> pools for each disks, so that if replication tasks assigned to the a disk
> start to back off, they don't interfere with replication tasks assigned to
> other disks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]