kfaraz commented on code in PR #18819: URL: https://github.com/apache/druid/pull/18819#discussion_r2620478813
########## indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/WeightedCostFunction.java: ########## @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.indexing.seekablestream.supervisor.autoscaler; + +import org.apache.druid.java.util.common.logger.Logger; + +/** + * Weighted cost function combining lag, idle time, and change distance metrics. + * Uses adaptive bounds for normalization based on recent history. + */ +public class WeightedCostFunction Review Comment: I like the idea of the WeightedCostFunction, but I think we need to make it much more intuitive. #### Define requirements (already aligned with the current state of this PR): 1. A function that computes cost. 2. A task count with lower cost is better. 3. `cost = lagCost * lagWeight + idlenessCost * idleWeight` 4. Lower the task count, higher the predicted lag, higher the lag cost. 5. Higher the task count, higher the predicted idleness, higher the idleness cost. #### Simplify computations 1. Use linear scaling only, logarithmic scaling makes the terms difficult to reason about and debug. The diminishing returns effect is already enforced by the window (discrete distance). If more terms are needed to account for say, task operational overhead, we will add them in the future. 2. Use only one mode i.e. do not invert scaling, even when lag is abnormally high. 3. Use actual metrics instead of normalized or adaptive bounds. If a supervisor once saw a lag of 100M, the adaptive ratio would make a lag of 1M seem very small (`normalizedLag = 0.01` i.e. 1%). But in reality, a lag of 1M is bad too and needs to be given appropriate weight. 4. Always perform cost computation even if idleness is in the accepted range (0.2-0.6 in the PR). This would help us validate the correctness of the formula against real clusters by verifying that the current task count gives minimal cost. We may re-introduce some of these enhancements in later patches once we have more data points using this auto-scaler, but it is best to start as simple as possible. #### Use intuitive metric e.g. compute time Connect the result of the cost function to an actual metric to make it more intuitive. The best metric I can think of is compute time or compute cycles, as it may be related to actual monetary cost of running tasks. For example, what if we could model the cost as follows: 1. `lagCost = expected time (in seconds) required to recover current lag` 2. `idlenessCost = total compute time (in seconds) wasted being idle in a single taskDuration` 3. Intuitively, we can see that as task count increases, `lagCost` would increase and `idlenessCost` would decrease. The formula for these costs may be something like: ``` lagCost = expected time (in seconds) required to recover current lag = currentAggregateLag / (proposedTaskCount * avgRateOfProcessing) where, currentAggregateLag = sum of current lag (in records) across all partitions avgRateOfProcessing = average of task moving averages ``` ``` idlenessCost = total time (in seconds) wasted being idle in a single taskDuration = total task run time * predicted idleness ratio where, total task run time = (proposedTaskCount * taskDuration) predicted idleness ratio = (proposedTaskCount / currentTaskCount) - (1 - avgPollToIdleRatio) e.g. if current poll-to-idle-ratio is 0.7, tasks are idle 70% of the time, so reducing task count by 70% will make tasks busy all the time (idleness ratio = 0). ``` ##### Assumptions - Tasks are already at their peak processing rate and will remain at this rate. - poll-to-idle ratio scales linearly with task count. We may use some reasonable clamps for min (say 0.05) and max (say 0.95). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
