jihoonson commented on a change in pull request #11444:
URL: https://github.com/apache/druid/pull/11444#discussion_r672709947
##########
File path: docs/configuration/index.md
##########
@@ -1015,7 +1015,7 @@ There are additional configs for autoscaling (if it is
enabled):
|`druid.indexer.autoscale.pendingTaskTimeout`|How long a task can be in
"pending" state before the Overlord tries to scale up.|PT30S|
|`druid.indexer.autoscale.workerVersion`|If set, will only create nodes of set
version during autoscaling. Overrides dynamic configuration. |null|
|`druid.indexer.autoscale.workerPort`|The port that MiddleManagers will run
on.|8080|
-|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining
the number of workers needed for auto scaling when there is currently no worker
running. If unset or set to value of 0 or less, auto scaler will scale to
`minNumWorkers` in autoScaler config instead. This value should typically be
equal to `druid.worker.capacity` when you have a homogeneous cluster and the
average of `druid.worker.capacity` across the workers when you have a
heterogeneous cluster. Note: this config is only applicable to
`pendingTaskBased` provisioning strategy|-1|
+|`druid.indexer.autoscale.workerCapacityHint`| Worker capacity for determining
the number of workers needed for auto scaling when there is currently no worker
running. If unset or set to value of 0 or less, auto scaler will scale to
`minNumWorkers` in autoScaler config instead. This value should typically be
equal to `druid.worker.capacity` when your workers have a homogeneous capacity
and the average of `druid.worker.capacity` across the workers when your workers
have a heterogeneous capacity. Note: this config is only applicable to
`pendingTaskBased` provisioning strategy|-1|
Review comment:
@techdocsmith hmm, maybe this config needs more detailed description
about how auto scaler works. When there are ingestion jobs pending, the auto
scaler first computes how many new nodes are required to unblock those pending
tasks. Since each worker (middleManager or indexer) can run more than one task
at the same time (depending on `druid.worker.capacity`), the number of new
nodes to spin up is roughly `ceil(pending task count / worker capacity)`. The
problem is, the auto scaler runs on the overlord and is not aware of
`druid.worker.capacity`. Also, each worker can have a different value set to
`druid.worker.capacity` in a heterogeneous cluster. As a result, the auto
scaler is currently detecting the worker capacity from workers. However, this
cannot work when there is no workers running. This PR works around this issue
by adding a new config, `workerCapacityHint`, which can be used as a hint for
auto scaler to compute the number of new workers to spin up even when there is n
o workers running. So, to answer your questions,
> "Worker capacity for determining the number of new workers". If i set it
to 5, does the autoscaler spin up 5 workers? If so it's just the number of new
workers.
It depends on how many pending tasks you have. If you have 25 pending tasks,
then yes it will spin up 5 new workers. If you have 8 pending tasks, the auto
scaler will spin up 2 new workers because it is hinted that each worker has 5
task slots.
> Also If I set it to 5, does each worker thn need to have 5 task slots too?
I wasn't sure of that relationship: " Each worker (middleManager or indexer) is
assumed to have this amount of task slots."
I think the relationship is opposite direction. When each worker has 5 task
slots, then you need to set this config to 5 so that the auto scaler can
correctly estimate the number of new workers needed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]