Wei-Chiu Chuang created HDDS-15412:
--------------------------------------

             Summary: [DataNode] Disk volume-specific container replication 
thread pool
                 Key: HDDS-15412
                 URL: https://issues.apache.org/jira/browse/HDDS-15412
             Project: Apache Ozone
          Issue Type: Task
          Components: EC, Ozone Datanode
            Reporter: Wei-Chiu Chuang
         Attachments: Screenshot 2026-05-26 at 11.30.41 PM.png, Screenshot 
2026-05-26 at 11.31.00 PM.png, Screenshot 2026-05-27 at 6.57.48 AM.png

We noticed a pattern where during EC decommission, the replication starts fast 
and every disk runs at full speed; however it doesn't last. After a while, the 
overall replication gradually slows until only one disk runs at full speed.
 !Screenshot 2026-05-26 at 11.30.41 PM.png|width=100%! 
 !Screenshot 2026-05-26 at 11.31.00 PM.png|width=100%! 
 !Screenshot 2026-05-27 at 6.57.48 AM.png|width=100%! 

It turns out that when a disk is assigned multiple replication tasks 
concurrently, the tasks slows down, delaying other replication tasks even 
though they are assigned to different disks. Eventually, the rest of disks 
become idle while that particular disk is full of replication tasks.

Proposal: Building on top of HDDS-15073, we need to create separate thread 
pools for each disks, so that if replication tasks assigned to the a disk start 
to back off, they don't interfere with replication tasks assigned to other 
disks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to