gargvishesh commented on code in PR #16681:
URL: https://github.com/apache/druid/pull/16681#discussion_r1771187508
##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
~ under the License.
-->
-In Apache Druid, compaction is a special type of ingestion task that reads
data from a Druid datasource and writes it back into the same datasource. A
common use case for this is to [optimally size
segments](../operations/segment-optimization.md) after ingestion to improve
query performance. Automatic compaction, or auto-compaction, refers to the
system for automatic execution of compaction tasks managed by the [Druid
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid
cluster. See the [examples](#examples) for common use cases for automatic
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing
period](../configuration/index.md#coordinator-operation),
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first
checks for segments to compact at most 30 minutes from when auto-compaction is
enabled.
-This time period affects other Coordinator duties including merge and
conversion tasks.
-To configure the auto-compaction time period without interfering with
`indexingPeriod`, see [Set frequency of compaction
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads
data from a Druid datasource and writes it back into the same datasource. A
common use case for this is to [optimally size
segments](../operations/segment-optimization.md) after ingestion to improve
query performance. Automatic compaction, or auto-compaction, refers to the
system for automatic execution of compaction tasks managed by the [Druid
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
-At every invocation of auto-compaction, the Coordinator initiates a [segment
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction)
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the
Overlord and MSQ task engine for compaction provides faster compaction times as
well as better memory tuning and usage. Both methods use the same syntax, but
you use different methods to submit the automatic compaction.
:::info
Auto-compaction skips datasources that have a segment granularity of `ALL`.
:::
As a best practice, you should set up auto-compaction for all Druid
datasources. You can run compaction tasks manually for cases where you want to
allocate more system resources. For example, you may choose to run multiple
compaction tasks in parallel to compact an existing datasource for the first
time. See [Compaction](compaction.md) for additional details and use cases.
+This topic guides you through setting up automatic compaction for your Druid
cluster. See the [examples](#examples) for common use cases for automatic
compaction.
-## Enable automatic compaction
+## Coordinator-based
-You can enable automatic compaction for a datasource using the web console or
programmatically via an API.
-This process differs for manual compaction tasks, which can be submitted from
the [Tasks view of the web console](../operations/web-console.md) or the [Tasks
API](../api-reference/tasks-api.md).
+The Coordinator [indexing
period](../configuration/index.md#coordinator-operation),
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction
tasks.
+The default indexing period is 30 minutes, meaning that the Coordinator first
checks for segments to compact at most 30 minutes from when auto-compaction is
enabled.
+This time period affects other Coordinator duties including merge and
conversion tasks.
+To configure the auto-compaction time period without interfering with
`indexingPeriod`, see [Set frequency of compaction runs](#compaction-frequency).
-### Web console
+At every invocation of auto-compaction, the Coordinator initiates a [segment
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction)
to determine eligible segments to compact.
+When there are eligible segments to compact, the Coordinator issues compaction
tasks based on available worker capacity.
+If a compaction task takes longer than the indexing period, the Coordinator
waits for it to finish before resuming the period for segment search.
-Use the web console to enable automatic compaction for a datasource as follows.
+No additional configuration is needed to run automatic compaction tasks using
the Coordinator and native engine. This is the default behavior for Druid.
-1. Click **Datasources** in the top-level navigation.
-2. In the **Compaction** column, click the edit icon for the datasource to
compact.
-3. In the **Compaction config** dialog, configure the auto-compaction
settings. The dialog offers a form view as well as a JSON view. Editing the
form updates the JSON specification, and editing the JSON updates the form
field, if present. Form fields not present in the JSON indicate default values.
You may add additional properties to the JSON for auto-compaction settings not
displayed in the form. See [Configure automatic
compaction](#configure-automatic-compaction) for supported settings for
auto-compaction.
-4. Click **Submit**.
-5. Refresh the **Datasources** view. The **Compaction** column for the
datasource changes from “Not enabled” to “Awaiting first run.”
+## Overlord-based
-The following screenshot shows the compaction config dialog for a datasource
with auto-compaction enabled.
-
+You can run automatic compaction using the Overlord rather than the
Coordinator. Running compaction tasks on the Overlord means that polling the
task status and running compaction at a higher frequency is more efficient than
a comparable compaction task that runs on the Coordinator. When running
compaction tasks using the Overlord, Druid checks to see if there is data to
compact in a datasource every 5 seconds.
-To disable auto-compaction for a datasource, click **Delete** from the
**Compaction config** dialog. Druid does not retain your auto-compaction
configuration.
+* In your Overlord runtime properties, set the following properties:
+ * `druid.supervisor.compaction.enabled` to `true` so that compaction tasks
can be run as a supervisor task
+ * `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ
task engine as the compaction engine or to `native`.
Review Comment:
`defaultEngine` is now changed to just `engine`
##########
docs/data-management/automatic-compaction.md:
##########
@@ -142,6 +163,108 @@ druid.coordinator.compaction.duties=["compactSegments"]
druid.coordinator.compaction.period=PT60S
```
+## Use Overlord-based automatic compaction
+
+When you use the Overlord for automatic compaction, Druid uses a supervisor
task on the Overlord to perform the compaction. Since it's a supervisor task,
automatic compaction using the Overlord can run frequently while providing
faster compaction times as well as better memory tuning and usage.
+
+When you use Overlord-based automatic compaction, you can use either the
native engine like Coordinator-based automatic compaction or the [MSQ task
engine](#use-msq-for-automatic-compaction).
+
+By default, Druid checks every 5 seconds to see whether or not compaction is
required.
+
+### Use MSQ for automatic compaction
+
+The MSQ task engine is available as a compaction engine if you configure
compaction tasks to run on the Overlord as a supervisor. To use the MSQ task
engine for automatic compaction, make sure the following requirements are met:
+
+* Have the [MSQ task engine extension
loaded](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+ * `druid.supervisor.compaction.enabled` to `true` so that compaction tasks
can be run as a supervisor task
+ * `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ
task engine as the compaction engine
Review Comment:
`defaultEngine` is updated to `engine` now
```suggestion
* `druid.supervisor.compaction.engine` to `msq` to specify the MSQ task
engine as the compaction engine
```
##########
docs/multi-stage-query/known-issues.md:
##########
@@ -68,3 +68,15 @@ properties, and the `indexSpec`
[`tuningConfig`](../ingestion/ingestion-spec.md#
- The maximum number of elements in a window cannot exceed a value of 100,000.
- To avoid `leafOperators` in MSQ engine, window functions have an extra scan
stage after the window stage for cases
where native engine has a non-empty `leafOperator`.
+
+## Automatic compaction
+
+<!--This list also exists in data-management/automatic-compaction-->
+
+The following known issues and limitations affect automatic compaction with
the MSQ task engine:
+
+- The `metricSpec` field is only supported for idempotent aggregators. For
more information, see [Idempotent
aggregators](../data-management/automatic-compaction.md#idempotent-aggregators).
+- Only dynamic and range-based partitioning are supported
+- Set `rollup` to `true` if `metricSpec` is not empty or null. If
`metricSpec` is empty or null, set `rollup` to `false`.
+- You cannot group on multi-value dimensions
Review Comment:
Same comments as prev. on the limitations
##########
docs/data-management/automatic-compaction.md:
##########
@@ -188,6 +163,108 @@ druid.coordinator.compaction.duties=["compactSegments"]
druid.coordinator.compaction.period=PT60S
```
+## Use Overlord-based automatic compaction
+
+When you use the Overlord for automatic compaction, Druid uses a supervisor
task on the Overlord to perform the compaction. Since it's a supervisor task,
automatic compaction using the Overlord can run frequently while providing
faster compaction times as well as better memory tuning and usage.
+
+When you use Overlord-based automatic compaction, you can use either the
native engine like Coordinator-based automatic compaction or the [MSQ task
engine](#use-msq-for-automatic-compaction).
+
+By default, Druid checks every 5 seconds to see whether or not compaction is
required.
+
+### Use MSQ for automatic compaction
+
+The MSQ task engine is available as a compaction engine if you configure
compaction tasks to run on the Overlord as a supervisor. To use the MSQ task
engine for automatic compaction, make sure the following requirements are met:
+
+* Have the [MSQ task engine extension
loaded](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+ * `druid.supervisor.compaction.enabled` to `true` so that compaction tasks
can be run as a supervisor task
+ * `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ
task engine as the compaction engine
+* Have at least two compaction task slots available or set
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine
requires at least two tasks to run, one controller task and one worker task.
+
+You can use [MSQ task engine context parameters](../multi-stage-query/) in
`compactionConfig.taskContext` when configuring your datasource for automatic
compaction, such as setting the maximum number of tasks using the
`compactionConfig.taskContext.maxNumTasks` parameter. Some of the MSQ task
engine context parameters overlap with automatic compaction parameters. When
these settings overlap, set one or the other.
+
+To submit an automatic compaction task, you submit a supervisor spec through
the UI or API with the type `autocompact` and the `spec` where you define the
compaction behavior using the [automatic compaction
syntax](#automatic-compaction-syntax). You can use the [web
console](#ui-for-overlord-based-compaction)
+
+### UI for Overlord-based compaction
+
+To submit a supervisor spec for MSQ task engine autocompaction, perform the
following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+ - The type of supervisor spec by setting `"type": "autocompact"`
+ - The compaction configuration by adding it to the `spec` field
+ ```json
+ {
+ "type": "autocompact",
+ "spec": {
+ "dataSource": YOUR_DATASOURCE,
+ ...
+ ...
+ }
+ ```
+1. Submit the supervisor.
+
+To stop the automatic compaction task, suspend or terminate the supervisor
through the UI or API.
+
+### API for Overlord-based compaction
+
+Submitting an automatic compaction as a supervisor task uses the same endpoint
as supervisor tasks for streaming ingestion.
+
+The following example configures auto-compaction for the `wikipedia`
datasource:
+
+```sh
+curl --location --request POST
'http://localhost:8081/druid/indexer/v1/supervisor' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+ "type": "autocompact", // required
+ "suspended": false, // optional
+ "spec": { // required
+ "dataSource": "wikipedia", // required
+ "tuningConfig": {...}, // optional
+ "granularitySpec": {...}, // optional
+ ...
+ }
+}'
+```
+
+To stop the automatic compaction task, suspend or terminate the supervisor
through the UI or API.
+
+### MSQ task engine limitations
+
+When using the MSQ task engine for auto-compaction, keep the following
limitations in mind:
+
+- The `metricSpec` field is only supported for idempotent aggregators. For
more information, see [Idempotent aggregators](#idempotent-aggregators).
+- Only dynamic and range-based partitioning are supported
+- Set `rollup` to `true` if `metricSpec` is not empty or null. If
`metricSpec` is empty or null, set `rollup` to `false`.
+- You cannot group on multi-value dimensions
+- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use
`maxRowsPerSegment` instead.
+
+#### Idempotent aggregators
+
+Idempotent aggregators are aggregators that can be applied repeatedly on a
column and each run produces the same results, such as the following `longSum`
aggregator:
+
+```
+{"name": "added", "type": "longSum", "fieldName": "added"}
+```
+
+where the input and output column are both `added`.
+
+The following are some examples of non-idempotent aggregators where each run
of the aggregator produces different results:
+
+* `longSum` aggregator where the `added` column rolls up into the `sum_added`
column:
+ ```
+ {"name": "sum_added", "type": "longSum", "fieldName": "added" }
+ ```
+* Partial sketches:
+ ```
+ {"name": added, "type":"", fieldName: added}
Review Comment:
Is the type intentionally skipped?
##########
docs/data-management/automatic-compaction.md:
##########
@@ -142,6 +163,108 @@ druid.coordinator.compaction.duties=["compactSegments"]
druid.coordinator.compaction.period=PT60S
```
+## Use Overlord-based automatic compaction
+
+When you use the Overlord for automatic compaction, Druid uses a supervisor
task on the Overlord to perform the compaction. Since it's a supervisor task,
automatic compaction using the Overlord can run frequently while providing
faster compaction times as well as better memory tuning and usage.
+
+When you use Overlord-based automatic compaction, you can use either the
native engine like Coordinator-based automatic compaction or the [MSQ task
engine](#use-msq-for-automatic-compaction).
+
+By default, Druid checks every 5 seconds to see whether or not compaction is
required.
+
+### Use MSQ for automatic compaction
+
+The MSQ task engine is available as a compaction engine if you configure
compaction tasks to run on the Overlord as a supervisor. To use the MSQ task
engine for automatic compaction, make sure the following requirements are met:
+
+* Have the [MSQ task engine extension
loaded](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+ * `druid.supervisor.compaction.enabled` to `true` so that compaction tasks
can be run as a supervisor task
+ * `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ
task engine as the compaction engine
+* Have at least two compaction task slots available or set
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine
requires at least two tasks to run, one controller task and one worker task.
+
+You can use [MSQ task engine context parameters](../multi-stage-query/) in
`compactionConfig.taskContext` when configuring your datasource for automatic
compaction, such as setting the maximum number of tasks using the
`compactionConfig.taskContext.maxNumTasks` parameter. Some of the MSQ task
engine context parameters overlap with automatic compaction parameters. When
these settings overlap, set one or the other.
+
+To submit an automatic compaction task, you submit a supervisor spec through
the UI or API with the type `autocompact` and the `spec` where you define the
compaction behavior using the [automatic compaction
syntax](#automatic-compaction-syntax). You can use the [web
console](#ui-for-overlord-based-compaction)
+
+### UI for Overlord-based compaction
+
+To submit a supervisor spec for MSQ task engine automatic compaction, perform
the following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+ - The type of supervisor spec by setting `"type": "autocompact"`
+ - The compaction configuration by adding it to the `spec` field
+ ```json
+ {
+ "type": "autocompact",
+ "spec": {
+ "dataSource": YOUR_DATASOURCE,
+ ...
+ ...
+ }
+ ```
+1. Submit the supervisor.
+
+To stop the automatic compaction task, suspend or terminate the supervisor
through the UI or API.
+
+### API for Overlord-based compaction
+
+Submitting an automatic compaction as a supervisor task uses the same endpoint
as supervisor tasks for streaming ingestion.
+
+The following example configures auto-compaction for the `wikipedia`
datasource:
+
+```sh
+curl --location --request POST
'http://localhost:8081/druid/indexer/v1/supervisor' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+ "type": "autocompact", // required
+ "suspended": false, // optional
+ "spec": { // required
+ "dataSource": "wikipedia", // required
+ "tuningConfig": {...}, // optional
+ "granularitySpec": {...}, // optional
+ ...
+ }
+}'
+```
+
+To stop the automatic compaction task, suspend or terminate the supervisor
through the UI or API.
+
+### MSQ task engine limitations
+
+When using the MSQ task engine for auto-compaction, keep the following
limitations in mind:
+
+- The `metricSpec` field is only supported for idempotent aggregators. For
more information, see [Idempotent aggregators](#idempotent-aggregators).
+- Only dynamic and range-based partitioning are supported
+- Set `rollup` to `true` if `metricSpec` is not empty or null. If
`metricSpec` is empty or null, set `rollup` to `false`.
Review Comment:
If metrics-spec is empty or null, a user can set rollup as `false` or null,
so we can omit it that part.
```suggestion
- Set `rollup` to `true` if and only if `metricSpec` is not empty or null.
```
##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
~ under the License.
-->
-In Apache Druid, compaction is a special type of ingestion task that reads
data from a Druid datasource and writes it back into the same datasource. A
common use case for this is to [optimally size
segments](../operations/segment-optimization.md) after ingestion to improve
query performance. Automatic compaction, or auto-compaction, refers to the
system for automatic execution of compaction tasks managed by the [Druid
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid
cluster. See the [examples](#examples) for common use cases for automatic
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing
period](../configuration/index.md#coordinator-operation),
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first
checks for segments to compact at most 30 minutes from when auto-compaction is
enabled.
-This time period affects other Coordinator duties including merge and
conversion tasks.
-To configure the auto-compaction time period without interfering with
`indexingPeriod`, see [Set frequency of compaction
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads
data from a Druid datasource and writes it back into the same datasource. A
common use case for this is to [optimally size
segments](../operations/segment-optimization.md) after ingestion to improve
query performance. Automatic compaction, or auto-compaction, refers to the
system for automatic execution of compaction tasks managed by the [Druid
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
-At every invocation of auto-compaction, the Coordinator initiates a [segment
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction)
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the
Overlord and MSQ task engine for compaction provides faster compaction times as
well as better memory tuning and usage. Both methods use the same syntax, but
you use different methods to submit the automatic compaction.
:::info
Auto-compaction skips datasources that have a segment granularity of `ALL`.
:::
As a best practice, you should set up auto-compaction for all Druid
datasources. You can run compaction tasks manually for cases where you want to
allocate more system resources. For example, you may choose to run multiple
compaction tasks in parallel to compact an existing datasource for the first
time. See [Compaction](compaction.md) for additional details and use cases.
+This topic guides you through setting up automatic compaction for your Druid
cluster. See the [examples](#examples) for common use cases for automatic
compaction.
-## Enable automatic compaction
+## Coordinator-based
-You can enable automatic compaction for a datasource using the web console or
programmatically via an API.
-This process differs for manual compaction tasks, which can be submitted from
the [Tasks view of the web console](../operations/web-console.md) or the [Tasks
API](../api-reference/tasks-api.md).
+The Coordinator [indexing
period](../configuration/index.md#coordinator-operation),
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction
tasks.
+The default indexing period is 30 minutes, meaning that the Coordinator first
checks for segments to compact at most 30 minutes from when auto-compaction is
enabled.
+This time period affects other Coordinator duties including merge and
conversion tasks.
+To configure the auto-compaction time period without interfering with
`indexingPeriod`, see [Set frequency of compaction runs](#compaction-frequency).
-### Web console
+At every invocation of auto-compaction, the Coordinator initiates a [segment
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction)
to determine eligible segments to compact.
+When there are eligible segments to compact, the Coordinator issues compaction
tasks based on available worker capacity.
+If a compaction task takes longer than the indexing period, the Coordinator
waits for it to finish before resuming the period for segment search.
-Use the web console to enable automatic compaction for a datasource as follows.
+No additional configuration is needed to run automatic compaction tasks using
the Coordinator and native engine. This is the default behavior for Druid.
-1. Click **Datasources** in the top-level navigation.
-2. In the **Compaction** column, click the edit icon for the datasource to
compact.
-3. In the **Compaction config** dialog, configure the auto-compaction
settings. The dialog offers a form view as well as a JSON view. Editing the
form updates the JSON specification, and editing the JSON updates the form
field, if present. Form fields not present in the JSON indicate default values.
You may add additional properties to the JSON for auto-compaction settings not
displayed in the form. See [Configure automatic
compaction](#configure-automatic-compaction) for supported settings for
auto-compaction.
-4. Click **Submit**.
-5. Refresh the **Datasources** view. The **Compaction** column for the
datasource changes from “Not enabled” to “Awaiting first run.”
+## Overlord-based
-The following screenshot shows the compaction config dialog for a datasource
with auto-compaction enabled.
-
+You can run automatic compaction using the Overlord rather than the
Coordinator. Running compaction tasks on the Overlord means that polling the
task status and running compaction at a higher frequency is more efficient than
a comparable compaction task that runs on the Coordinator. When running
compaction tasks using the Overlord, Druid checks to see if there is data to
compact in a datasource every 5 seconds.
-To disable auto-compaction for a datasource, click **Delete** from the
**Compaction config** dialog. Druid does not retain your auto-compaction
configuration.
+* In your Overlord runtime properties, set the following properties:
+ * `druid.supervisor.compaction.enabled` to `true` so that compaction tasks
can be run as a supervisor task
+ * `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ
task engine as the compaction engine or to `native`.
-### Compaction configuration API
+After making these changes, you can submit automatic compaction tasks as
supervisors. For more general information about supervisors, see
[Supervisors](../ingestion/supervisor.md).
-Use the [Automatic compaction
API](../api-reference/automatic-compaction-api.md#manage-automatic-compaction)
to configure automatic compaction.
-To enable auto-compaction for a datasource, create a JSON object with the
desired auto-compaction settings.
-See [Configure automatic compaction](#configure-automatic-compaction) for the
syntax of an auto-compaction spec.
-Send the JSON object as a payload in a [`POST`
request](../api-reference/automatic-compaction-api.md#create-or-update-automatic-compaction-configuration)
to `/druid/coordinator/v1/config/compaction`.
-The following example configures auto-compaction for the `wikipedia`
datasource:
-
-```sh
-curl --location --request POST
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
---header 'Content-Type: application/json' \
---data-raw '{
- "dataSource": "wikipedia",
- "granularitySpec": {
- "segmentGranularity": "DAY"
- }
-}'
-```
-
-To disable auto-compaction for a datasource, send a [`DELETE`
request](../api-reference/automatic-compaction-api.md#remove-automatic-compaction-configuration)
to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace
`{dataSource}` with the name of the datasource for which to disable
auto-compaction. For example:
-
-```sh
-curl --location --request DELETE
'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
-```
-
-## Configure automatic compaction
+## Automatic compaction syntax
You can configure automatic compaction dynamically without restarting Druid.
-The automatic compaction system uses the following syntax:
+Both the native and MSQ task engine automatic compaction engines use the
following syntax:
Review Comment:
Agree. Since there is no change to the existing flow and everything newly
added is currently experimental, we should have it all separated out to the end
of the doc.
##########
docs/data-management/automatic-compaction.md:
##########
@@ -142,6 +163,108 @@ druid.coordinator.compaction.duties=["compactSegments"]
druid.coordinator.compaction.period=PT60S
```
+## Use Overlord-based automatic compaction
+
+When you use the Overlord for automatic compaction, Druid uses a supervisor
task on the Overlord to perform the compaction. Since it's a supervisor task,
automatic compaction using the Overlord can run frequently while providing
faster compaction times as well as better memory tuning and usage.
+
+When you use Overlord-based automatic compaction, you can use either the
native engine like Coordinator-based automatic compaction or the [MSQ task
engine](#use-msq-for-automatic-compaction).
+
+By default, Druid checks every 5 seconds to see whether or not compaction is
required.
+
+### Use MSQ for automatic compaction
+
+The MSQ task engine is available as a compaction engine if you configure
compaction tasks to run on the Overlord as a supervisor. To use the MSQ task
engine for automatic compaction, make sure the following requirements are met:
+
+* Have the [MSQ task engine extension
loaded](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+ * `druid.supervisor.compaction.enabled` to `true` so that compaction tasks
can be run as a supervisor task
+ * `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ
task engine as the compaction engine
+* Have at least two compaction task slots available or set
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine
requires at least two tasks to run, one controller task and one worker task.
+
+You can use [MSQ task engine context parameters](../multi-stage-query/) in
`compactionConfig.taskContext` when configuring your datasource for automatic
compaction, such as setting the maximum number of tasks using the
`compactionConfig.taskContext.maxNumTasks` parameter. Some of the MSQ task
engine context parameters overlap with automatic compaction parameters. When
these settings overlap, set one or the other.
+
+To submit an automatic compaction task, you submit a supervisor spec through
the UI or API with the type `autocompact` and the `spec` where you define the
compaction behavior using the [automatic compaction
syntax](#automatic-compaction-syntax). You can use the [web
console](#ui-for-overlord-based-compaction)
+
+### UI for Overlord-based compaction
+
+To submit a supervisor spec for MSQ task engine automatic compaction, perform
the following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+ - The type of supervisor spec by setting `"type": "autocompact"`
+ - The compaction configuration by adding it to the `spec` field
+ ```json
+ {
+ "type": "autocompact",
+ "spec": {
+ "dataSource": YOUR_DATASOURCE,
+ ...
+ ...
+ }
+ ```
+1. Submit the supervisor.
+
+To stop the automatic compaction task, suspend or terminate the supervisor
through the UI or API.
+
+### API for Overlord-based compaction
+
+Submitting an automatic compaction as a supervisor task uses the same endpoint
as supervisor tasks for streaming ingestion.
+
+The following example configures auto-compaction for the `wikipedia`
datasource:
+
+```sh
+curl --location --request POST
'http://localhost:8081/druid/indexer/v1/supervisor' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+ "type": "autocompact", // required
+ "suspended": false, // optional
+ "spec": { // required
+ "dataSource": "wikipedia", // required
+ "tuningConfig": {...}, // optional
+ "granularitySpec": {...}, // optional
+ ...
+ }
+}'
+```
+
+To stop the automatic compaction task, suspend or terminate the supervisor
through the UI or API.
+
+### MSQ task engine limitations
+
+When using the MSQ task engine for auto-compaction, keep the following
limitations in mind:
+
+- The `metricSpec` field is only supported for idempotent aggregators. For
more information, see [Idempotent aggregators](#idempotent-aggregators).
+- Only dynamic and range-based partitioning are supported
+- Set `rollup` to `true` if `metricSpec` is not empty or null. If
`metricSpec` is empty or null, set `rollup` to `false`.
+- You cannot group on multi-value dimensions
Review Comment:
Limitation of group-by on MVD is removed now. Updating to a different
limitation.
```suggestion
- You can only partition on string dimensions. However, multi-valued string
dimensions are not supported.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]