jihoonson commented on a change in pull request #11440:
URL: https://github.com/apache/druid/pull/11440#discussion_r669296990
##########
File path:
extensions-contrib/gce-extensions/src/main/java/org/apache/druid/indexing/overlord/autoscaling/gce/GceAutoScaler.java
##########
@@ -80,8 +80,8 @@ public GceAutoScaler(
@JsonProperty("envConfig") GceEnvironmentConfig envConfig
)
{
- Preconditions.checkArgument(minNumWorkers > 0,
- "minNumWorkers must be greater than 0");
+ Preconditions.checkArgument(minNumWorkers >= 0,
Review comment:
I think the overlord should fail to start if `minNumWorkers = 0` and the
new config is not set to a positive because auto scaler will not work anyway.
It would be better to fail early.
##########
File path: docs/configuration/index.md
##########
@@ -1015,6 +1015,7 @@ There are additional configs for autoscaling (if it is
enabled):
|`druid.indexer.autoscale.pendingTaskTimeout`|How long a task can be in
"pending" state before the Overlord tries to scale up.|PT30S|
|`druid.indexer.autoscale.workerVersion`|If set, will only create nodes of set
version during autoscaling. Overrides dynamic configuration. |null|
|`druid.indexer.autoscale.workerPort`|The port that MiddleManagers will run
on.|8080|
+|`druid.indexer.autoscale.workerCapacityFallback`| Worker capacity for
determining the number of workers needed for auto scaling when there is
currently no worker running. If unset or set to value of 0 or less, auto scaler
will scale to `minNumWorkers` in autoScaler config instead. Note: this config
is only applicable to `pendingTaskBased` provisioning strategy|-1|
Review comment:
"fallback" in the name does not seem intuitive to me. How about
`workerCapacityHint`? Also, it would be nice to document how to set this value.
For example, the doc can say "this value should be typically equal to
`druid.worker.capacity` when you have a homogeneous cluster". For heterogeneous
clusters, does it make sense to set this to the average capacity?
##########
File path:
indexing-service/src/main/java/org/apache/druid/indexing/overlord/autoscaling/PendingTaskBasedWorkerProvisioningStrategy.java
##########
@@ -246,13 +246,16 @@ private int getScaleUpNodeCount(
log.info("Min/max workers: %d/%d", minWorkerCount, maxWorkerCount);
final int currValidWorkers = getCurrValidWorkers(workers);
- // If there are no worker, spin up minWorkerCount, we cannot determine
the exact capacity here to fulfill the need
- // since we are not aware of the expectedWorkerCapacity.
- int moreWorkersNeeded = currValidWorkers == 0 ? minWorkerCount :
getWorkersNeededToAssignTasks(
+ // If there are no worker and workerCapacityFallback config is not set
(-1) or invalid (<= 0), then spin up minWorkerCount
+ // as we cannot determine the exact capacity here to fulfill the need.
+ // However, if there are no worker but workerCapacityFallback config is
set (>0), then we can
+ // determine the number of workers needed using workerCapacityFallback
config as expected worker capacity
+ int moreWorkersNeeded = currValidWorkers == 0 &&
config.getWorkerCapacityFallback() <= 0 ? minWorkerCount :
getWorkersNeededToAssignTasks(
Review comment:
nit: this line is too long. Please break the line. One example would be
```java
int moreWorkersNeeded = currValidWorkers == 0 &&
config.getWorkerCapacityFallback() <= 0
? minWorkerCount
: getWorkersNeededToAssignTasks(
remoteTaskRunnerConfig,
workerConfig,
pendingTasks,
workers,
config.getWorkerCapacityFallback()
);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]