Fly-Style commented on code in PR #18991:
URL: https://github.com/apache/druid/pull/18991#discussion_r2786380123


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/WeightedCostFunction.java:
##########
@@ -73,12 +68,19 @@ public CostResult computeCost(
       }
     } else {
       // Lag recovery time is decreasing by adding tasks and increasing by 
ejecting tasks.
+      // In case of increasing lag, we apply an amplification factor to 
reflect the urgency of addressing lag.
       // Caution: we rely only on the metrics, the real issues may be 
absolutely different, up to hardware failure.
-      lagRecoveryTime = metrics.getAggregateLag() / (proposedTaskCount * 
avgProcessingRate);
+      if (metrics.getAggregateLag() <= 0) {
+        lagRecoveryTime = 0;
+      } else {
+        final double lagPerPartition = metrics.getAggregateLag() / 
metrics.getPartitionCount();
+        final double amplification = Math.max(1.0, 1.0 + 
LAG_AMPLIFICATION_MULTIPLIER * Math.log(lagPerPartition));

Review Comment:
   >  Would it be possible to see how the autoscaler does without the 
amplification?
   
   No, both the math research and testing showed that it behaves poorly under 
several lag conditions. Generally, it scales, but not enough and then lag 
becames stale. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to