kfaraz commented on code in PR #18991:
URL: https://github.com/apache/druid/pull/18991#discussion_r2786065644
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/WeightedCostFunction.java:
##########
@@ -73,12 +68,19 @@ public CostResult computeCost(
}
} else {
// Lag recovery time is decreasing by adding tasks and increasing by
ejecting tasks.
+ // In case of increasing lag, we apply an amplification factor to
reflect the urgency of addressing lag.
// Caution: we rely only on the metrics, the real issues may be
absolutely different, up to hardware failure.
- lagRecoveryTime = metrics.getAggregateLag() / (proposedTaskCount *
avgProcessingRate);
+ if (metrics.getAggregateLag() <= 0) {
+ lagRecoveryTime = 0;
+ } else {
+ final double lagPerPartition = metrics.getAggregateLag() /
metrics.getPartitionCount();
+ final double amplification = Math.max(1.0, 1.0 +
LAG_AMPLIFICATION_MULTIPLIER * Math.log(lagPerPartition));
+ lagRecoveryTime = metrics.getAggregateLag() * amplification /
(proposedTaskCount * avgProcessingRate);
+ }
}
- final double predictedIdleRatio = estimateIdleRatio(metrics,
proposedTaskCount, config.getHighLagThreshold());
- final double idleCost = proposedTaskCount *
metrics.getTaskDurationSeconds() * predictedIdleRatio;
+ final double predictedIdleRatio = estimateIdleRatio(metrics,
proposedTaskCount);
+ final double idleCost = proposedTaskCount * predictedIdleRatio;
Review Comment:
Getting rid of the time component here makes it difficult to compare
`lagCost` and `idleCost` since they are not relatable entities anymore. The
former deals with "time" whereas the latter is dimension-less (or maybe uses an
implicit duration of 1 second).
I guess it would be more meaningful to use something like:
```suggestion
final double idleCost = proposedTaskCount * lagRecoveryTime *
predictedIdleRatio;
```
This way the `idleCost` would denote the seconds wasted being idle while lag
was being recovered. I suppose this is more relatable than multiplying by task
duration.
Could you please try the calculations with this update?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]