Re: [PR] Introduce cost-based tasks autoscaler for streaming ingestion (druid)

via GitHub Mon, 15 Dec 2025 11:00:30 -0800


kfaraz commented on code in PR #18819:
URL: https://github.com/apache/druid/pull/18819#discussion_r2620478813



##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/WeightedCostFunction.java:
##########
@@ -0,0 +1,324 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.seekablestream.supervisor.autoscaler;
+
+import org.apache.druid.java.util.common.logger.Logger;
+
+/**
+ * Weighted cost function combining lag, idle time, and change distance 
metrics.
+ * Uses adaptive bounds for normalization based on recent history.
+ */
+public class WeightedCostFunction

Review Comment:
   I like the idea of the WeightedCostFunction, but I think we need to make it 
much more intuitive.
   
   #### Define requirements (already aligned with the current state of this PR):
   
   1. A function that computes cost.
   2. A task count with lower cost is better.
   3. `cost = lagCost * lagWeight + idlenessCost * idleWeight`
   4. Lower the task count, higher the predicted lag, higher the lag cost.
   5. Higher the task count, higher the predicted idleness, higher the idleness 
cost.
   
   #### Simplify computations
   
   1. Use linear scaling only, logarithmic scaling makes the terms difficult to 
reason about and debug.
   The diminishing returns effect is already enforced by the window (discrete 
distance). If more terms are needed to account for say, task operational 
overhead, we will add them in the future.
   2. Use only one mode i.e. do not invert scaling, even when lag is abnormally 
high.
   3. Use actual metrics instead of normalized or adaptive bounds.
   If a supervisor once saw a lag of 100M, the adaptive ratio would make a lag 
of 1M seem very small (`normalizedLag = 0.01` i.e. 1%). But in reality, a lag 
of 1M is bad too and needs to be given appropriate weight.
   4. Always perform cost computation even if idleness is in the accepted range 
(0.2-0.6 in the PR).
   This would help us validate the correctness of the formula against real 
clusters by verifying that the current task count gives minimal cost.
   
   We may re-introduce some of these enhancements in later patches once we have 
more data points using this auto-scaler, but it is best to start as simple as 
possible.
   
   #### Use intuitive metric e.g. compute time
   
   Connect the result of the cost function to an actual metric to make it more 
intuitive. The best metric I can think of is compute time or compute cycles, as 
it may be related to actual monetary cost of running tasks.
   
   For example, what if we could model the cost as follows:
   1. `lagCost = expected time (in seconds) required to recover current lag`
   2. `idlenessCost = total compute time (in seconds) wasted being idle in a 
single taskDuration`
   3. Intuitively, we can see that as task count increases, `lagCost` would 
increase and `idlenessCost` would decrease.
   
   The formula for these costs may be something like:
   ```
   lagCost
   = expected time (in seconds) required to recover current lag
   = currentAggregateLag / (proposedTaskCount * avgRateOfProcessing)
   
   where,
   currentAggregateLag = sum of current lag (in records) across all partitions
   avgRateOfProcessing = average of task moving averages
   ```
   
   ```
   idlenessCost
   = total time (in seconds) wasted being idle in a single taskDuration
   = total task run time * predicted idleness ratio
   
   where,
   total task run time = (proposedTaskCount * taskDuration)
   
   predicted idleness ratio = (proposedTaskCount / currentTaskCount) - (1 - 
avgPollToIdleRatio)
   
   e.g. if current poll-to-idle-ratio is 0.7, tasks are idle 70% of the time,
   so reducing task count by 70% will make tasks busy all the time (idleness 
ratio = 0).
   ```
   
   ##### Assumptions
   
   - Tasks are already at their peak processing rate and will remain at this 
rate.
   - poll-to-idle ratio scales linearly with task count. We may use some 
reasonable clamps for min (say 0.05) and max (say 0.95).
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Introduce cost-based tasks autoscaler for streaming ingestion (druid)

Reply via email to