kfaraz commented on code in PR #13993:
URL: https://github.com/apache/druid/pull/13993#discussion_r1152737075
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
Review Comment:
```suggestion
Issuing a GET request to the same URL returns the current Overlord dynamic
config.
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
Review Comment:
```suggestion
There are 4 options for select strategies:
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is
`weak`
+- a preferred worker (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available
and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for
the assigned datasources and no other.
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and
`equalDistributionWithCategorySpec` strategies) for a given task, the list of
workers eligible to be assigned is determined as follows:
-###### Equal Distribution
+- any worker if no categoryConfig is given for task type
+- any worker if categoryConfig is given for task type but no category is given
for datasource and there's no default category
+- a preferred worker (based on categoryConfig and category for datasource) if
available
+- any worker if categoryConfig and category are given but no preferred worker
is available and categoryConfig is `weak`
+- not assigned at all if preferred workers are not available and
`categoryConfig` is `strong`
-Tasks are assigned to the MiddleManager with the most free slots at the time
the task begins running. This is useful if
-you want work evenly distributed across your MiddleManagers.
+In both cases, Druid constructs the eligible worker and selects one depending
on their load with the goal of either distributing the load equally or filling
as few workers as possible.
Review Comment:
nit:
```suggestion
In both the cases, Druid determines the list of eligible workers and selects
one depending on their load with the goal of either distributing the load
equally or filling as few workers as possible.
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
Review Comment:
```suggestion
To view the last `n` entries of the audit history of worker config, issue a
GET request to the following URL:
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is
`weak`
+- a preferred worker (a worker listed in `affinityConfig`) if available
Review Comment:
this item might be better placed before the previous item in the list
```suggestion
- a preferred worker listed in the `affinityConfig` for this datasource if
it has available capacity
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is
`weak`
+- a preferred worker (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available
and affinity is `strong`
Review Comment:
```suggestion
- no worker if preferred workers are not available and affinity is _strong_
i.e. `strong: true`. In this case, the task remains in "pending" state. The
chosen provisioning strategy (e.g. `pendingTaskBased`) may then use the total
number of pending tasks to determine if a new node should be provisioned.
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is
`weak`
+- a preferred worker (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available
and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for
the assigned datasources and no other.
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and
`equalDistributionWithCategorySpec` strategies) for a given task, the list of
workers eligible to be assigned is determined as follows:
Review Comment:
Similar suggestions here as with `affinityConfig`:
```suggestion
If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec`
and `equalDistributionWithCategorySpec` strategies), then a task of a given
datasource may be assigned to:
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is
`weak`
+- a preferred worker (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available
and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for
the assigned datasources and no other.
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and
`equalDistributionWithCategorySpec` strategies) for a given task, the list of
workers eligible to be assigned is determined as follows:
-###### Equal Distribution
+- any worker if no categoryConfig is given for task type
+- any worker if categoryConfig is given for task type but no category is given
for datasource and there's no default category
+- a preferred worker (based on categoryConfig and category for datasource) if
available
+- any worker if categoryConfig and category are given but no preferred worker
is available and categoryConfig is `weak`
+- not assigned at all if preferred workers are not available and
`categoryConfig` is `strong`
-Tasks are assigned to the MiddleManager with the most free slots at the time
the task begins running. This is useful if
-you want work evenly distributed across your MiddleManagers.
+In both cases, Druid constructs the eligible worker and selects one depending
on their load with the goal of either distributing the load equally or filling
as few workers as possible.
+
+If you are using auto-scaling, use the `fillCapacity` select strategy since
auto-scaled nodes can
+not be assigned a category, and you want the work to be concentrated on the
fewest number of workers to allow the empty ones to scale down.
+
+###### `equalDistribution`
+
+Tasks are assigned to the MiddleManager with the most free slots at the time
the task begins running.
+This evenly distributes work across your MiddleManagers.
|Property|Description|Default|
|--------|-----------|-------|
-|`type`|`equalDistribution`.|required; must be `equalDistribution`|
-|`affinityConfig`|[Affinity config](#affinity) object|null (no affinity)|
+|`type`|`equalDistribution`|required; must be `equalDistribution`|
+|`affinityConfig`|[Affinity config](#affinityconfig) object|null (no affinity)|
-###### Equal Distribution With Category Spec
+###### `equalDistributionWithCategorySpec`
-This strategy is a variant of `Equal Distribution`, which support
`workerCategorySpec` field rather than `affinityConfig`. By specifying
`workerCategorySpec`, you can assign tasks to run on different categories of
MiddleManagers based on the tasks' **taskType** and **dataSource name**. This
strategy can't work with `AutoScaler` since the behavior is undefined.
+This strategy is a variant of `equalDistribution`, which supports
`workerCategorySpec` field rather than `affinityConfig`.
+By specifying `workerCategorySpec`, you can assign tasks to run on different
categories of MiddleManagers based on the task's **type** and **dataSource**.
Review Comment:
nit:
```suggestion
By specifying `workerCategorySpec`, you can assign tasks to run on different
categories of MiddleManagers based on the **type** and **dataSource** of the
task.
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
Review Comment:
```suggestion
At a high level, the select strategy determines the list of eligible workers
for a given task using
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is
`weak`
+- a preferred worker (a worker listed in `affinityConfig`) if available
+- not assigned at all (remains pending) if preferred workers are not available
and affinity is `strong`
+
+Note that every worker listed in the `affinityConfig` will only be used for
the assigned datasources and no other.
-Worker select strategies control how Druid assigns tasks to MiddleManagers.
+If a `categorySpec` is provided (as part of `fillCapacityWithCategorySpec` and
`equalDistributionWithCategorySpec` strategies) for a given task, the list of
workers eligible to be assigned is determined as follows:
-###### Equal Distribution
+- any worker if no categoryConfig is given for task type
+- any worker if categoryConfig is given for task type but no category is given
for datasource and there's no default category
+- a preferred worker (based on categoryConfig and category for datasource) if
available
+- any worker if categoryConfig and category are given but no preferred worker
is available and categoryConfig is `weak`
+- not assigned at all if preferred workers are not available and
`categoryConfig` is `strong`
-Tasks are assigned to the MiddleManager with the most free slots at the time
the task begins running. This is useful if
-you want work evenly distributed across your MiddleManagers.
+In both cases, Druid constructs the eligible worker and selects one depending
on their load with the goal of either distributing the load equally or filling
as few workers as possible.
+
+If you are using auto-scaling, use the `fillCapacity` select strategy since
auto-scaled nodes can
Review Comment:
We should probably emphasize this point a little by putting it in a note or
warn box or something.
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
+
+- a non-affinity worker if no affinity is specified for that datasource. Any
worker not listed in the `affinityConfig` is considered a non-affinity worker.
+- a non-affinity worker if preferred workers are not available and affinity is
`weak`
Review Comment:
I think we should either italicize "weak" (since it is really code or
config), or we can use `strong: false`.
```suggestion
- a non-affinity worker if preferred workers are not available and the
affinity is _weak_ i.e. `strong: false`.
```
##########
docs/configuration/index.md:
##########
@@ -1223,52 +1223,86 @@ A sample worker config spec is shown below:
}
```
-Issuing a GET request at the same URL will return the current worker config
spec that is currently in place. The worker config spec list above is just a
sample for EC2 and it is possible to extend the code base for other deployment
environments. A description of the worker config spec is shown below.
+Issuing a GET request at the same URL returns the current Overlord dynamic
config spec.
-|Property|Description|Default|
-|--------|-----------|-------|
-|`selectStrategy`|How to assign tasks to MiddleManagers. Choices are
`fillCapacity`, `equalDistribution`, and `javascript`.|equalDistribution|
-|`autoScaler`|Only used if autoscaling is enabled. See below.|null|
+|Property| Description
| Default |
+|--------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
+|`selectStrategy`| Describes how to assign tasks to MiddleManagers. The type
can be `equalDistribution`, `equalDistributionWithCategorySpec`,
`fillCapacity`, `fillCapacityWithCategorySpec`, and `javascript`. |
`{"type":"equalDistribution"} |
+|`autoScaler`| Only used if autoscaling is enabled. See below.
| null |
To view the audit history of worker config issue a GET request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?interval=<interval>
```
-default value of interval can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
+The default value of `interval` can be specified by setting
`druid.audit.manager.auditHistoryMillis` (1 week if not configured) in Overlord
runtime.properties.
To view last `n` entries of the audit history of worker config issue a GET
request to the URL -
```
http://<OVERLORD_IP>:<port>/druid/indexer/v1/worker/history?count=<n>
```
-##### Worker Select Strategy
+##### Worker select strategy
+
+The select strategy controls how Druid assigns tasks to workers
(MiddleManagers).
+At a high level, the select strategy determines the list of possible workers
that a task can be assigned to using
+either an `affinityConfig` or a `categorySpec`. Then, Druid assigns the task
by either trying to distribute load equally
+(`equalDistribution`) or to fill as many workers as possible to capacity
(`fillCapacity`).
+There are 4 options for select strategies:
+
+- [`equalDistribution`](#equaldistribution)
+- [`equalDistributionWithCategorySpec`](#equalDistributionWithCategorySpec)
+- [`fillCapacity`](#fillcapacity)
+- [`fillCapacityWithCategorySpec`](#fillcapacitywithcategoryspec)
+
+A `javascript` option is also available but should only be used for
prototyping new strategies.
+
+If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies) for a given task, the list of workers eligible
to be assigned is determined as follows:
Review Comment:
Slight rewording of the later part of this sentence and the list items for a
little more readability:
```suggestion
If an `affinityConfig` is provided (as part of `fillCapacity` and
`equalDistribution` strategies), then a task of a given datasource may be
assigned to:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]