kfaraz commented on code in PR #18991:
URL: https://github.com/apache/druid/pull/18991#discussion_r2786074776
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/WeightedCostFunction.java:
##########
@@ -73,12 +68,19 @@ public CostResult computeCost(
}
} else {
// Lag recovery time is decreasing by adding tasks and increasing by
ejecting tasks.
+ // In case of increasing lag, we apply an amplification factor to
reflect the urgency of addressing lag.
// Caution: we rely only on the metrics, the real issues may be
absolutely different, up to hardware failure.
- lagRecoveryTime = metrics.getAggregateLag() / (proposedTaskCount *
avgProcessingRate);
+ if (metrics.getAggregateLag() <= 0) {
+ lagRecoveryTime = 0;
+ } else {
+ final double lagPerPartition = metrics.getAggregateLag() /
metrics.getPartitionCount();
+ final double amplification = Math.max(1.0, 1.0 +
LAG_AMPLIFICATION_MULTIPLIER * Math.log(lagPerPartition));
+ lagRecoveryTime = metrics.getAggregateLag() * amplification /
(proposedTaskCount * avgProcessingRate);
Review Comment:
Maybe rename this field to `lagRecoveryTimeInSeconds` so that we are always
able to think in terms of the relevant units.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]