Fly-Style commented on code in PR #18936: URL: https://github.com/apache/druid/pull/18936#discussion_r2721936785
########## indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/WeightedCostFunction.java: ########## @@ -106,10 +110,14 @@ public CostResult computeCost(CostMetrics metrics, int proposedTaskCount, CostBa /** - * Estimates the idle ratio for a given task count using a capacity-based linear model. + * Estimates the idle ratio for a proposed task count. + * Includes lag-based adjustment to eliminate high lag and Review Comment: > Also, in addition to the above, I think adding in this lag consideration does add some complexity here. Mainly it generally starts us down the path of making the cost function harder to easily and quickly understand for a newcomer, IMO. Sometimes you want to have complex things in the project, because they make some things work slightly better. A good example is query planner / query optimizer, which we have from the Calcite side. It's not easy to enter, hard to master, but with complexity it brings a good framework to start using SQL for your database. Same here: in order to make supervisor autoscaling work well, we need to introduce a level of complexity baked by math (the formulas are described here: https://github.com/apache/druid/pull/18819). During the testing, I realized it is too conservative in the high lag scenarios, and it is an attempt to tweak it a bit. _I hook up your question from general comment:_ > I also wonder if a more in depth technical writeup once this feature is stabilized is in order. Something that explains the method to the madness and a bit of the math. Perhaps in a brief blog post or docs page within the apache Druid website? We must do it, but the feature is not finally stabilized; anyway, it already has a decent base. > So I guess that begs the question, how did we or are we going to measure the improvement that this additional logic/computation provides? That's a very good question, and I would answer in the following manner: the less time we spend scaling supervisors manually / fine-tuning an autoscaler, the better the result we will receive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
