Fly-Style commented on code in PR #18991:
URL: https://github.com/apache/druid/pull/18991#discussion_r2786380123
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/WeightedCostFunction.java:
##########
@@ -73,12 +68,19 @@ public CostResult computeCost(
}
} else {
// Lag recovery time is decreasing by adding tasks and increasing by
ejecting tasks.
+ // In case of increasing lag, we apply an amplification factor to
reflect the urgency of addressing lag.
// Caution: we rely only on the metrics, the real issues may be
absolutely different, up to hardware failure.
- lagRecoveryTime = metrics.getAggregateLag() / (proposedTaskCount *
avgProcessingRate);
+ if (metrics.getAggregateLag() <= 0) {
+ lagRecoveryTime = 0;
+ } else {
+ final double lagPerPartition = metrics.getAggregateLag() /
metrics.getPartitionCount();
+ final double amplification = Math.max(1.0, 1.0 +
LAG_AMPLIFICATION_MULTIPLIER * Math.log(lagPerPartition));
Review Comment:
> Would it be possible to see how the autoscaler does without the
amplification?
No, both the math research and testing showed that it behaves poorly under
several lag conditions. Generally, it scales, but not enough and then lag
becames stale.
For close-to-zero lag that amplifier almost not exists.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]