Re: [PR] HDDS-13817. Add fixed-delay scheduling mode to SCMBlockDeletingService [ozone]

via GitHub Fri, 14 Nov 2025 00:03:50 -0800


ChenSammi commented on code in PR #9185:
URL: https://github.com/apache/ozone/pull/9185#discussion_r2526269047



##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfig.java:
##########
@@ -138,6 +139,15 @@ public class ScmConfig extends ReconfigurableConfig {
   )
   private int transactionToDNsCommitMapLimit = 5000000;
 
+  @Config(key = "hdds.scm.block.deleting.service.scheduling.mode",
+      defaultValue = "FIXED_RATE",
+      type = ConfigType.STRING,
+      tags = { ConfigTag.SCM, ConfigTag.DELETION },
+      description = "Scheduling mode for the block deleting service. For 
detailed, " +
+          "see org.apache.hadoop.hdds.utils.SchedulingMode"
+  )
+  private String blockDeletingServiceSchedulingMode = 
SchedulingMode.FIXED_RATE.name();

Review Comment:
   Had a offline discussion with Xi about the background of this improvement. 
It arise from a real observation data Xi has shared.  The rocksDB access could 
be very fast in most of cases, but there can be unexpected slow spikes 
sometimes.  
   
   A lot of Ozone services use the BackgroundService currently to schedule the 
task on fixed rate, SCMBlockDeletingService is one of them.  
SCMBlockDeletingService uses DeletedBlockLogImpl, which updates deletion 
transactions in DB with a lock protected. Add/Get/HandleCommandResponseFromDN 
all competes this lock. If block deletion task runs one after another without 
any delay, it will hold the lock for the most of time, leave less time for Add 
operation which serves request from OM, and handle block deletion command 
response from DN, so these two types requests could pine up in SCM.  The new 
fixed delay can solve this problem, so every operation have the similar change 
to get the lock and proceed. 
   
   The cost of fixed delay is block deletion task will run less frequently from 
fixed rat, which I think is not a big problem. As if the task finishes very 
quickly,  then fixed delay and fixed rat are close. If the task take more time 
than expected, fixed delay is better than fixed rat,  it can better handle this 
case,  to have a smooth SCM overall activities experience. 
   Also with fixed delay, it's safe to tune the 
"hdds.scm.block.deleting.service.interval" value, we don't need to worry about 
whether the value is too short to cover one task run. 
   
   Actually,  I suggest we can consider to change the default mode to fixed 
delay, as fixed delay is more adaptive to different deletion cases. 
   



##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfig.java:
##########
@@ -138,6 +139,15 @@ public class ScmConfig extends ReconfigurableConfig {
   )
   private int transactionToDNsCommitMapLimit = 5000000;
 
+  @Config(key = "hdds.scm.block.deleting.service.scheduling.mode",
+      defaultValue = "FIXED_RATE",
+      type = ConfigType.STRING,
+      tags = { ConfigTag.SCM, ConfigTag.DELETION },
+      description = "Scheduling mode for the block deleting service. For 
detailed, " +
+          "see org.apache.hadoop.hdds.utils.SchedulingMode"
+  )
+  private String blockDeletingServiceSchedulingMode = 
SchedulingMode.FIXED_RATE.name();

Review Comment:
   Had a offline discussion with Xi about the background of this improvement. 
It arise from a real observation data Xi has shared.  The rocksDB access could 
be very fast in most of cases, but there can be unexpected slow spikes 
sometimes.  
   
   A lot of Ozone services use the BackgroundService currently to schedule the 
task on fixed rate, SCMBlockDeletingService is one of them.  
SCMBlockDeletingService uses DeletedBlockLogImpl, which updates deletion 
transactions in DB with a lock protected. Add/Get/HandleCommandResponseFromDN 
all competes this lock. If block deletion task runs one after another without 
any delay, it will hold the lock for the most of time, leave less time for Add 
operation which serves request from OM, and handle block deletion command 
response from DN, so these two types requests could pine up in SCM.  The new 
fixed delay can solve this problem, so every operation have the similar change 
to get the lock and proceed. 
   
   The cost of fixed delay is block deletion task will run less frequently from 
fixed rat, which I think is not a big problem. As if the task finishes very 
quickly,  then fixed delay and fixed rat are close. If the task take more time 
than expected, fixed delay is better than fixed rat,  it can better handle this 
case,  to have a smooth SCM overall activities experience. 
   Also with fixed delay, it's safe to tune the 
"hdds.scm.block.deleting.service.interval" value, we don't need to worry about 
whether the value is too short to cover one task run. 
   
   Actually,  I suggest we can consider to change the default mode to fixed 
delay, as fixed delay is more adaptive to different deletion cases.  cc 
@ashishkumar50 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-13817. Add fixed-delay scheduling mode to SCMBlockDeletingService [ozone]

Reply via email to