ChenSammi commented on code in PR #9185:
URL: https://github.com/apache/ozone/pull/9185#discussion_r2526269047
##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfig.java:
##########
@@ -138,6 +139,15 @@ public class ScmConfig extends ReconfigurableConfig {
)
private int transactionToDNsCommitMapLimit = 5000000;
+ @Config(key = "hdds.scm.block.deleting.service.scheduling.mode",
+ defaultValue = "FIXED_RATE",
+ type = ConfigType.STRING,
+ tags = { ConfigTag.SCM, ConfigTag.DELETION },
+ description = "Scheduling mode for the block deleting service. For
detailed, " +
+ "see org.apache.hadoop.hdds.utils.SchedulingMode"
+ )
+ private String blockDeletingServiceSchedulingMode =
SchedulingMode.FIXED_RATE.name();
Review Comment:
Had a offline discussion with Xi about the background of this improvement.
It arise from a real observation data Xi has shared. The rocksDB access could
be very fast in most of cases, but there can be unexpected slow spikes
sometimes.
A lot of Ozone services use the BackgroundService currently to schedule the
task on fixed rate, SCMBlockDeletingService is one of them.
SCMBlockDeletingService uses DeletedBlockLogImpl, which updates deletion
transactions in DB with a lock protected. Add/Get/HandleCommandResponseFromDN
all competes this lock. If block deletion task runs one after another without
any delay, it will hold the lock for the most of time, leave less time for Add
operation which serves request from OM, and handle block deletion command
response from DN, so these two types requests could pine up in SCM. The new
fixed delay can solve this problem, so every operation have the similar change
to get the lock and proceed.
The cost of fixed delay is block deletion task will run less frequently from
fixed rat, which I think is not a big problem. As if the task finishes very
quickly, then fixed delay and fixed rat are close. If the task take more time
than expected, fixed delay is better than fixed rat, it can better handle this
case, to have a smooth SCM overall activities experience.
Also with fixed delay, it's safe to tune the
"hdds.scm.block.deleting.service.interval" value, we don't need to worry about
whether the value is too short to cover one task run.
Actually, I suggest we can consider to change the default mode to fixed
delay, as fixed delay is more adaptive to different deletion cases.
##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ScmConfig.java:
##########
@@ -138,6 +139,15 @@ public class ScmConfig extends ReconfigurableConfig {
)
private int transactionToDNsCommitMapLimit = 5000000;
+ @Config(key = "hdds.scm.block.deleting.service.scheduling.mode",
+ defaultValue = "FIXED_RATE",
+ type = ConfigType.STRING,
+ tags = { ConfigTag.SCM, ConfigTag.DELETION },
+ description = "Scheduling mode for the block deleting service. For
detailed, " +
+ "see org.apache.hadoop.hdds.utils.SchedulingMode"
+ )
+ private String blockDeletingServiceSchedulingMode =
SchedulingMode.FIXED_RATE.name();
Review Comment:
Had a offline discussion with Xi about the background of this improvement.
It arise from a real observation data Xi has shared. The rocksDB access could
be very fast in most of cases, but there can be unexpected slow spikes
sometimes.
A lot of Ozone services use the BackgroundService currently to schedule the
task on fixed rate, SCMBlockDeletingService is one of them.
SCMBlockDeletingService uses DeletedBlockLogImpl, which updates deletion
transactions in DB with a lock protected. Add/Get/HandleCommandResponseFromDN
all competes this lock. If block deletion task runs one after another without
any delay, it will hold the lock for the most of time, leave less time for Add
operation which serves request from OM, and handle block deletion command
response from DN, so these two types requests could pine up in SCM. The new
fixed delay can solve this problem, so every operation have the similar change
to get the lock and proceed.
The cost of fixed delay is block deletion task will run less frequently from
fixed rat, which I think is not a big problem. As if the task finishes very
quickly, then fixed delay and fixed rat are close. If the task take more time
than expected, fixed delay is better than fixed rat, it can better handle this
case, to have a smooth SCM overall activities experience.
Also with fixed delay, it's safe to tune the
"hdds.scm.block.deleting.service.interval" value, we don't need to worry about
whether the value is too short to cover one task run.
Actually, I suggest we can consider to change the default mode to fixed
delay, as fixed delay is more adaptive to different deletion cases. cc
@ashishkumar50
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]