Re: [PR] Do not kill a task if offsets are inconsistent but publish from another group is pending (druid)

via GitHub Mon, 09 Mar 2026 11:08:31 -0700


jtuglu1 commented on code in PR #19091:
URL: https://github.com/apache/druid/pull/19091#discussion_r2907108205



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/CostBasedAutoScaler.java:
##########
@@ -62,10 +63,17 @@ public class CostBasedAutoScaler implements 
SupervisorTaskAutoScaler
   public static final String LAG_COST_METRIC = 
"task/autoScaler/costBased/lagCost";
   public static final String IDLE_COST_METRIC = 
"task/autoScaler/costBased/idleCost";
   public static final String OPTIMAL_TASK_COUNT_METRIC = 
"task/autoScaler/costBased/optimalTaskCount";
+  public static final String INVALID_METRICS_COUNT = 
"task/autoScaler/costBased/invalidMetrics";
 
   static final int MAX_INCREASE_IN_PARTITIONS_PER_TASK = 2;
   static final int MAX_DECREASE_IN_PARTITIONS_PER_TASK = 
MAX_INCREASE_IN_PARTITIONS_PER_TASK * 2;
 
+  /**
+   * If average partition lag crosses this value and the processing rate is
+   * still zero, scaling actions are skipped and an alert is raised.
+   */
+  static final int MAX_IDLENESS_PARTITION_LAG = 10_000;

Review Comment:
   > But if the lag exceeds this value AND processing rate is zero, that 
indicates something is wrong with the tasks. 
   
   I guess my point is we have topics where exceeding 10k is probably too late 
to detect something is up (we've already broken an SLO). We can leave it for 
now to avoid config bloat, but I don't really like to hardcode this stuff. IMO, 
when we start to add more tweakable configs/magic numbers to the solution it 
points at a larger underlying issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Do not kill a task if offsets are inconsistent but publish from another group is pending (druid)

Reply via email to