(druid) branch 31.0.0 updated: [Backport]docs: backport msq autocompact docs [#16681] (#17374)

victoria Thu, 17 Oct 2024 18:05:31 -0700

This is an automated email from the ASF dual-hosted git repository.

victoria pushed a commit to branch 31.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/31.0.0 by this push:
     new 26d5e950730 [Backport]docs: backport msq autocompact docs [#16681] 
(#17374)
26d5e950730 is described below

commit 26d5e95073058b9adf3827b676e0df81175bdc6e
Author: 317brian <[email protected]>
AuthorDate: Thu Oct 17 18:05:20 2024 -0700

    [Backport]docs: backport msq autocompact docs [#16681] (#17374)
    
    Co-authored-by: Kashif Faraz <[email protected]>
    Co-authored-by: Vishesh Garg <[email protected]>
    Co-authored-by: Victoria Lim <[email protected]>
---
 docs/api-reference/automatic-compaction-api.md |   8 +-
 docs/configuration/index.md                    |   3 +-
 docs/data-management/automatic-compaction.md   | 276 ++++++++++++++++++-------
 docs/ingestion/concurrent-append-replace.md    |   2 +-
 docs/ingestion/supervisor.md                   |  12 +-
 docs/multi-stage-query/known-issues.md         |  13 ++
 website/.spelling                              |   1 +
 7 files changed, 237 insertions(+), 78 deletions(-)

diff --git a/docs/api-reference/automatic-compaction-api.md 
b/docs/api-reference/automatic-compaction-api.md
index a443e108639..3ad90c9d339 100644
--- a/docs/api-reference/automatic-compaction-api.md
+++ b/docs/api-reference/automatic-compaction-api.md
@@ -27,7 +27,13 @@ import TabItem from '@theme/TabItem';
   ~ under the License.
   -->
 
-This topic describes the status and configuration API endpoints for [automatic 
compaction](../data-management/automatic-compaction.md) in Apache Druid. You 
can configure automatic compaction in the Druid web console or API.
+This topic describes the status and configuration API endpoints for [automatic 
compaction using Coordinator 
duties](../data-management/automatic-compaction.md#auto-compaction-using-coordinator-duties)
 in Apache Druid. You can configure automatic compaction in the Druid web 
console or API.
+
+:::info Experimental
+
+Instead of the automatic compaction API, you can use the supervisor API to 
submit auto-compaction jobs using compaction supervisors. For more information, 
see [Auto-compaction using compaction 
supervisors](../data-management/automatic-compaction.md#auto-compaction-using-compaction-supervisors).
+
+:::
 
 In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router 
service address and port. Replace it with the information for your deployment. 
For example, use `http://localhost:8888` for quickstart deployments.
 
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index a370cd7a411..a4b7789199c 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -1046,7 +1046,7 @@ The following table shows the supported configurations 
for auto-compaction.
 
 |Property|Description|Required|
 |--------|-----------|--------|
-|type|The task type, this should always be `index_parallel`.|yes|
+|type|The task type. If you're using Coordinator duties for auto-compaction, 
set it to `index_parallel`. If you're using compaction supervisors, set it to 
`autocompact`. |yes|
 |`maxRowsInMemory`|Used in determining when intermediate persists to disk 
should occur. Normally user does not need to set this, but depending on the 
nature of data, if rows are short in terms of bytes, user may not want to store 
a million rows in memory and this value should be set.|no (default = 1000000)|
 |`maxBytesInMemory`|Used in determining when intermediate persists to disk 
should occur. Normally this is computed internally and user does not need to 
set it. This value represents number of bytes to aggregate in heap memory 
before persisting. This is based on a rough estimate of memory usage and not 
actual usage. The maximum heap memory usage for indexing is `maxBytesInMemory` 
* (2 + `maxPendingPersists`)|no (default = 1/6 of max JVM memory)|
 |`splitHintSpec`|Used to give a hint to control the amount of data that each 
first phase task reads. This hint could be ignored depending on the 
implementation of the input source. See [Split hint 
spec](../ingestion/native-batch.md#split-hint-spec) for more details.|no 
(default = size-based split hint spec)|
@@ -1063,6 +1063,7 @@ The following table shows the supported configurations 
for auto-compaction.
 |`taskStatusCheckPeriodMs`|Polling period in milliseconds to check running 
task statuses.|no (default = 1000)|
 |`chatHandlerTimeout`|Timeout for reporting the pushed segments in worker 
tasks.|no (default = PT10S)|
 |`chatHandlerNumRetries`|Retries for reporting the pushed segments in worker 
tasks.|no (default = 5)|
+|`engine` | Engine for compaction. Can be either `native` or `msq`. `msq`  
uses the MSQ task engine and is only supported with [compaction 
supervisors](../data-management/automatic-compaction.md#auto-compaction-using-compaction-supervisors).
 | no (default = native)|
 
 ###### Automatic compaction granularitySpec
 
diff --git a/docs/data-management/automatic-compaction.md 
b/docs/data-management/automatic-compaction.md
index 4fe49f8beb5..cf129ea1ee2 100644
--- a/docs/data-management/automatic-compaction.md
+++ b/docs/data-management/automatic-compaction.md
@@ -22,19 +22,7 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
-
-At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks issued by Druid itself. In 
addition to auto-compaction, you can perform [manual compaction]( [...]
 
 :::info
  Auto-compaction skips datasources that have a segment granularity of `ALL`.
@@ -42,53 +30,9 @@ If a compaction task takes longer than the indexing period, 
the Coordinator wait
 
 As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
 
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
 
-## Enable automatic compaction
-
-You can enable automatic compaction for a datasource using the web console or 
programmatically via an API.
-This process differs for manual compaction tasks, which can be submitted from 
the [Tasks view of the web console](../operations/web-console.md) or the [Tasks 
API](../api-reference/tasks-api.md).
-
-### Web console
-
-Use the web console to enable automatic compaction for a datasource as follows.
-
-1. Click **Datasources** in the top-level navigation.
-2. In the **Compaction** column, click the edit icon for the datasource to 
compact.
-3. In the **Compaction config** dialog, configure the auto-compaction 
settings. The dialog offers a form view as well as a JSON view. Editing the 
form updates the JSON specification, and editing the JSON updates the form 
field, if present. Form fields not present in the JSON indicate default values. 
You may add additional properties to the JSON for auto-compaction settings not 
displayed in the form. See [Configure automatic 
compaction](#configure-automatic-compaction) for supported setti [...]
-4. Click **Submit**.
-5. Refresh the **Datasources** view. The **Compaction** column for the 
datasource changes from “Not enabled” to “Awaiting first run.”
-
-The following screenshot shows the compaction config dialog for a datasource 
with auto-compaction enabled.
-![Compaction config in web console](../assets/compaction-config.png)
-
-To disable auto-compaction for a datasource, click **Delete** from the 
**Compaction config** dialog. Druid does not retain your auto-compaction 
configuration.
-
-### Compaction configuration API
-
-Use the [Automatic compaction 
API](../api-reference/automatic-compaction-api.md#manage-automatic-compaction) 
to configure automatic compaction.
-To enable auto-compaction for a datasource, create a JSON object with the 
desired auto-compaction settings.
-See [Configure automatic compaction](#configure-automatic-compaction) for the 
syntax of an auto-compaction spec.
-Send the JSON object as a payload in a [`POST` 
request](../api-reference/automatic-compaction-api.md#create-or-update-automatic-compaction-configuration)
 to `/druid/coordinator/v1/config/compaction`.
-The following example configures auto-compaction for the `wikipedia` 
datasource:
-
-```sh
-curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
---header 'Content-Type: application/json' \
---data-raw '{
-    "dataSource": "wikipedia",
-    "granularitySpec": {
-        "segmentGranularity": "DAY"
-    }
-}'
-```
-
-To disable auto-compaction for a datasource, send a [`DELETE` 
request](../api-reference/automatic-compaction-api.md#remove-automatic-compaction-configuration)
 to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace 
`{dataSource}` with the name of the datasource for which to disable 
auto-compaction. For example:
-
-```sh
-curl --location --request DELETE 
'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
-```
-
-## Configure automatic compaction
+## Auto-compaction syntax
 
 You can configure automatic compaction dynamically without restarting Druid.
 The automatic compaction system uses the following syntax:
@@ -108,6 +52,14 @@ The automatic compaction system uses the following syntax:
 }
 ```
 
+:::info Experimental
+
+The MSQ task engine is available as a compaction engine when you run automatic 
compaction as a compaction supervisor. For more information, see 
[Auto-compaction using compaction 
supervisors](#auto-compaction-using-compaction-supervisors).
+
+:::
+
+For automatic compaction using Coordinator duties, you submit the spec to the 
[Compaction config UI](#manage-auto-compaction-using-the-web-console) or the 
[Compaction configuration API](#manage-auto-compaction-using-coordinator-apis).
+
 Most fields in the auto-compaction configuration correlate to a typical [Druid 
ingestion spec](../ingestion/ingestion-spec.md).
 The following properties only apply to auto-compaction:
 * `skipOffsetFromLatest`
@@ -131,7 +83,62 @@ maximize performance and minimize disk usage of the 
`compact` tasks launched by
 
 For more details on each of the specs in an auto-compaction configuration, see 
[Automatic compaction dynamic 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
 
-### Set frequency of compaction runs
+## Auto-compaction using Coordinator duties
+
+You can control how often the Coordinator checks to see if auto-compaction is 
needed. The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+This time period also affects other Coordinator duties such as cleanup of 
unused segments and stale pending segments.
+To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#change-compaction-frequency).
+
+At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
+When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
+If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+
+No additional configuration is needed to run automatic compaction tasks using 
the Coordinator and native engine. This is the default behavior for Druid.
+You can configure it for a datasource through the web console or 
programmatically via an API.
+This process differs for manual compaction tasks, which can be submitted from 
the [Tasks view of the web console](../operations/web-console.md) or the [Tasks 
API](../api-reference/tasks-api.md).
+
+### Manage auto-compaction using the web console
+
+Use the web console to enable automatic compaction for a datasource as follows:
+
+1. Click **Datasources** in the top-level navigation.
+2. In the **Compaction** column, click the edit icon for the datasource to 
compact.
+3. In the **Compaction config** dialog, configure the auto-compaction 
settings. The dialog offers a form view as well as a JSON view. Editing the 
form updates the JSON specification, and editing the JSON updates the form 
field, if present. Form fields not present in the JSON indicate default values. 
You may add additional properties to the JSON for auto-compaction settings not 
displayed in the form. See [Configure automatic 
compaction](#auto-compaction-syntax) for supported settings for  [...]
+4. Click **Submit**.
+5. Refresh the **Datasources** view. The **Compaction** column for the 
datasource changes from “Not enabled” to “Awaiting first run.”
+
+The following screenshot shows the compaction config dialog for a datasource 
with auto-compaction enabled.
+![Compaction config in web console](../assets/compaction-config.png)
+
+To disable auto-compaction for a datasource, click **Delete** from the 
**Compaction config** dialog. Druid does not retain your auto-compaction 
configuration.
+
+### Manage auto-compaction using Coordinator APIs  
+
+Use the [Automatic compaction 
API](../api-reference/automatic-compaction-api.md#manage-automatic-compaction) 
to configure automatic compaction.
+To enable auto-compaction for a datasource, create a JSON object with the 
desired auto-compaction settings.
+See [Configure automatic compaction](#auto-compaction-syntax) for the syntax 
of an auto-compaction spec.
+Send the JSON object as a payload in a [`POST` 
request](../api-reference/automatic-compaction-api.md#create-or-update-automatic-compaction-configuration)
 to `/druid/coordinator/v1/config/compaction`.
+The following example configures auto-compaction for the `wikipedia` 
datasource:
+
+```sh
+curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+    "dataSource": "wikipedia",
+    "granularitySpec": {
+        "segmentGranularity": "DAY"
+    }
+}'
+```
+
+To disable auto-compaction for a datasource, send a [`DELETE` 
request](../api-reference/automatic-compaction-api.md#remove-automatic-compaction-configuration)
 to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace 
`{dataSource}` with the name of the datasource for which to disable 
auto-compaction. For example:
+
+```sh
+curl --location --request DELETE 
'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
+```
+
+### Change compaction frequency
 
 If you want the Coordinator to check for compaction more frequently than its 
indexing period, create a separate group to handle compaction duties.
 Set the time period of the duty group in the `coordinator/runtime.properties` 
file.
@@ -142,6 +149,15 @@ druid.coordinator.compaction.duties=["compactSegments"]
 druid.coordinator.compaction.period=PT60S
 ```
 
+### View Coordinator duty auto-compaction stats
+
+After the Coordinator has initiated auto-compaction, you can view compaction 
statistics for the datasource, including the number of bytes, segments, and 
intervals already compacted and those awaiting compaction. The Coordinator also 
reports the total bytes, segments, and intervals not eligible for compaction in 
accordance with its [segment search 
policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction).
+
+In the web console, the Datasources view displays auto-compaction statistics. 
The Tasks view shows the task information for compaction tasks that were 
triggered by the automatic compaction system.
+
+To get statistics by API, send a [`GET` 
request](../api-reference/automatic-compaction-api.md#view-automatic-compaction-status)
 to `/druid/coordinator/v1/compaction/status`. To filter the results to a 
particular datasource, pass the datasource name as a query parameter to the 
request—for example, 
`/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
+
+
 ## Avoid conflicts with ingestion
 
 Compaction tasks may be interrupted when they interfere with ingestion. For 
example, this occurs when an ingestion task needs to write data to a segment 
for a time interval locked for compaction. If there are continuous failures 
that prevent compaction from making progress, consider one of the following 
strategies:
@@ -169,15 +185,6 @@ The Coordinator compacts segments from newest to oldest. 
In the auto-compaction
 
 To set `skipOffsetFromLatest`, consider how frequently you expect the stream 
to receive late arriving data. If your stream only occasionally receives late 
arriving data, the auto-compaction system robustly compacts your data even 
though data is ingested outside the `skipOffsetFromLatest` window. For most 
realtime streaming ingestion use cases, it is reasonable to set 
`skipOffsetFromLatest` to a few hours or a day.
 
-
-## View automatic compaction statistics
-
-After the Coordinator has initiated auto-compaction, you can view compaction 
statistics for the datasource, including the number of bytes, segments, and 
intervals already compacted and those awaiting compaction. The Coordinator also 
reports the total bytes, segments, and intervals not eligible for compaction in 
accordance with its [segment search 
policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction).
-
-In the web console, the Datasources view displays auto-compaction statistics. 
The Tasks view shows the task information for compaction tasks that were 
triggered by the automatic compaction system.
-
-To get statistics by API, send a [`GET` 
request](../api-reference/automatic-compaction-api.md#view-automatic-compaction-status)
 to `/druid/coordinator/v1/compaction/status`. To filter the results to a 
particular datasource, pass the datasource name as a query parameter to the 
request—for example, 
`/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
-
 ## Examples
 
 The following examples demonstrate potential use cases in which 
auto-compaction may improve your Druid performance. See more details in 
[Compaction 
strategies](../data-management/compaction.md#compaction-guidelines). The 
examples in this section do not change the underlying data.
@@ -221,6 +228,137 @@ The following auto-compaction configuration compacts 
updates the `wikipedia` seg
 }
 ```
 
+## Auto-compaction using compaction supervisors  
+
+:::info Experimental
+Compaction supervisors are experimental. For production use, we recommend 
[auto-compaction using Coordinator 
duties](#auto-compaction-using-coordinator-duties).
+:::
+
+You can run automatic compaction using compaction supervisors on the Overlord 
rather than Coordinator duties. Compaction supervisors provide the following 
benefits over Coordinator duties:
+
+* Can use the supervisor framework to get information about the 
auto-compaction, such as status or state
+* More easily suspend or resume compaction for a datasource
+* Can use either the native compaction engine or the [MSQ task 
engine](#use-msq-for-auto-compaction)
+* More reactive and submits tasks as soon as a compaction slot is available
+* Tracked compaction task status to avoid re-compacting an interval repeatedly
+
+
+To use compaction supervisors, set the following properties in your Overlord 
runtime properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as supervisor tasks
+  *  `druid.supervisor.compaction.engine` to  `msq` to specify the MSQ task 
engine as the compaction engine or to `native` to use the native engine. This 
is the default engine if the `engine` field is omitted from your compaction 
config
+
+Compaction supervisors use the same syntax as auto-compaction using  
Coordinator duties with one key difference: you submit the auto-compaction as a 
a supervisor spec. In the spec, set the `type` to `autocompact` and include the 
auto-compaction config in the `spec`.
+
+To submit an automatic compaction task, you can submit a supervisor spec 
through the [web console](#manage-compaction-supervisors-with-the-web-console) 
or the [supervisor API](#manage-compaction-supervisors-with-supervisor-apis).
+
+
+### Manage compaction supervisors with the web console
+
+To submit a supervisor spec for MSQ task engine automatic compaction, perform 
the following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+     - The type of supervisor spec by setting `"type": "autocompact"`
+     - The compaction configuration by adding it to the `spec` field
+    ```json
+    {
+     "type": "autocompact",
+     "spec": {
+       "dataSource": YOUR_DATASOURCE,
+       "tuningConfig": {...},
+       "granularitySpec": {...},
+       "engine": <native|msq>,
+       ...
+    }
+    ```
+1. Submit the supervisor.
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### Manage compaction supervisors with supervisor APIs
+
+Submitting an automatic compaction as a supervisor task uses the same endpoint 
as supervisor tasks for streaming ingestion.
+
+The following example configures auto-compaction for the `wikipedia` 
datasource:
+
+```sh
+curl --location --request POST 
'http://localhost:8081/druid/indexer/v1/supervisor' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+   "type": "autocompact",                     // required
+   "suspended": false,                        // optional 
+   "spec": {                                  // required
+       "dataSource": "wikipedia",             // required
+       "tuningConfig": {...},                 // optional
+       "granularitySpec": {...},              // optional
+       "engine": <native|msq>,                // optional
+       ...
+   }
+}'
+```
+
+Note that if you omit `spec.engine`, Druid uses the default compaction engine. 
You can control the default compaction engine with the 
`druid.supervisor.compaction.engine` Overlord runtime property. If 
`spec.engine` and `druid.supervisor.compaction.engine` are omitted, Druid 
defaults to the native engine.
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### Use MSQ for auto-compaction
+
+The MSQ task engine is available as a compaction engine if you configure 
auto-compaction to use compaction supervisors. To use the MSQ task engine for 
automatic compaction, make sure the following requirements are met:
+
+* [Load the MSQ task engine 
extension](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor task.
+  *  Optionally, set `druid.supervisor.compaction.engine` to `msq` to specify 
the MSQ task engine as the default compaction engine. If you don't do this, 
you'll need to set `spec.engine` to `msq` for each compaction supervisor spec 
where you want to use the MSQ task engine.
+* Have at least two compaction task slots available or set 
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine 
requires at least two tasks to run, one controller task and one worker task.
+
+You can use [MSQ task engine context 
parameters](../multi-stage-query/reference.md#context-parameters) in 
`spec.taskContext` when configuring your datasource for automatic compaction, 
such as setting the maximum number of tasks using the 
`spec.taskContext.maxNumTasks` parameter. Some of the MSQ task engine context 
parameters overlap with automatic compaction parameters. When these settings 
overlap, set one or the other.
+
+
+#### MSQ task engine limitations
+
+<!--This list also exists in multi-stage-query/known-issues-->
+
+When using the MSQ task engine for auto-compaction, keep the following 
limitations in mind:
+
+- The `metricSpec` field is only supported for certain aggregators. For more 
information, see [Supported aggregators](#supported-aggregators).
+- Only dynamic and range-based partitioning are supported.
+- Set `rollup`  to `true` if and only if `metricSpec` is not empty or null.
+- You can only partition on string dimensions. However, multi-valued string 
dimensions are not supported.
+- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use 
`maxRowsPerSegment` instead.
+- Segments can only be sorted on `__time` as the first column.
+
+#### Supported aggregators
+
+Auto-compaction using the MSQ task engine supports only aggregators that 
satisfy the following properties: 
+* __Mergeability__: can combine partial aggregates
+* __Idempotency__: produces the same results on repeated runs of the 
aggregator on previously aggregated values in a column
+
+This is exemplified by the following `longSum` aggregator:
+
+```
+{"name": "added", "type": "longSum", "fieldName": "added"}
+```
+
+where `longSum` being capable of combining partial results satisfies 
mergeability, while input and output column being the same (`added`) ensures 
idempotency.
+
+The following are some examples of aggregators that aren't supported since at 
least one of the required conditions aren't satisfied:
+
+*  `longSum` aggregator where the `added` column rolls up into `sum_added` 
column discarding the input `added` column, violating idempotency, as 
subsequent runs would no longer find the `added` column:
+    ```
+    {"name": "sum_added", "type": "longSum", "fieldName": "added"}
+    ```
+* Partial sketches which cannot themselves be used to combine partial 
aggregates and need merging aggregators -- such as `HLLSketchMerge` required 
for `HLLSketchBuild` aggregator below -- violating mergeability:
+    ```
+    {"name": "added", "type": "HLLSketchBuild", "fieldName": "added"}
+    ```
+* Count aggregator since it cannot be used to combine partial aggregates and 
it rolls up into a different `count` column discarding the input column(s), 
violating both mergeability and idempotency.
+    ```
+    {"type": "count", "name": "count"}
+    ```
+
+
+
 ## Learn more
 
 See the following topics for more information:
diff --git a/docs/ingestion/concurrent-append-replace.md 
b/docs/ingestion/concurrent-append-replace.md
index 0ac5b881564..5468bc28c5c 100644
--- a/docs/ingestion/concurrent-append-replace.md
+++ b/docs/ingestion/concurrent-append-replace.md
@@ -34,7 +34,7 @@ If you want to append data to a datasource while compaction 
is running, you need
 
 In the **Compaction config** for a datasource, enable  **Use concurrent locks 
(experimental)**.
 
-For details on accessing the compaction config in the UI, see [Enable 
automatic compaction with the web 
console](../data-management/automatic-compaction.md#web-console).
+For details on accessing the compaction config in the UI, see [Enable 
automatic compaction with the web 
console](../data-management/automatic-compaction.md#manage-auto-compaction-using-the-web-console).
 
 ### Update the compaction settings with the API
  
diff --git a/docs/ingestion/supervisor.md b/docs/ingestion/supervisor.md
index d5293ae581f..6eeed9d0854 100644
--- a/docs/ingestion/supervisor.md
+++ b/docs/ingestion/supervisor.md
@@ -23,22 +23,22 @@ sidebar_label: Supervisor
   ~ under the License.
   -->
 
-A supervisor manages streaming ingestion from external streaming sources into 
Apache Druid.
-Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained.
+Apache Druid uses supervisors to manage streaming ingestion from external 
streaming sources into Druid.
+Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained. They can also be used to perform [automatic 
compaction](../data-management/automatic-compaction.md) after data has been 
ingested.
 
 This topic uses the Apache Kafka term offset to refer to the identifier for 
records in a partition. If you are using Amazon Kinesis, the equivalent is 
sequence number.
 
 ## Supervisor spec
 
-Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks.
-The supervisor spec specifies how Druid should consume, process, and index 
streaming data.
+Druid uses a JSON specification, often referred to as the supervisor spec, to 
define tasks used for streaming ingestion or auto-compaction.
+The supervisor spec specifies how Druid should consume, process, and index 
data from an external stream or Druid itself.
 
 The following table outlines the high-level configuration options for a 
supervisor spec:
 
 |Property|Type|Description|Required|
 |--------|----|-----------|--------|
-|`type`|String|The supervisor type. One of `kafka`or `kinesis`.|Yes|
-|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`type`|String|The supervisor type. For streaming ingestion, this can be 
either `kafka`, `kinesis`, or `rabbit`. For automatic compaction, set the type 
to `autocompact`. |Yes|
+|`spec`|Object|The container object for the supervisor configuration. For 
automatic compaction, this is the same as the compaction configuration. |Yes|
 |`spec.dataSchema`|Object|The schema for the indexing task to use during 
ingestion. See [`dataSchema`](../ingestion/ingestion-spec.md#dataschema) for 
more information.|Yes|
 |`spec.ioConfig`|Object|The I/O configuration object to define the connection 
and I/O-related settings for the supervisor and indexing tasks.|Yes|
 |`spec.tuningConfig`|Object|The tuning configuration object to define 
performance-related settings for the supervisor and indexing tasks.|No|
diff --git a/docs/multi-stage-query/known-issues.md 
b/docs/multi-stage-query/known-issues.md
index 39286edfcdd..81d87cb6ac1 100644
--- a/docs/multi-stage-query/known-issues.md
+++ b/docs/multi-stage-query/known-issues.md
@@ -68,3 +68,16 @@ properties, and the `indexSpec` 
[`tuningConfig`](../ingestion/ingestion-spec.md#
 - The maximum number of elements in a window cannot exceed a value of 100,000. 
 - To avoid `leafOperators` in MSQ engine, window functions have an extra scan 
stage after the window stage for cases 
 where native engine has a non-empty `leafOperator`.
+
+## Automatic compaction
+
+<!-- If you update this list, also update 
data-management/automatic-compaction.md -->
+
+The following known issues and limitations affect automatic compaction with 
the MSQ task engine:
+
+- The `metricSpec` field is only supported for certain aggregators. For more 
information, see [Supported 
aggregators](../data-management/automatic-compaction.md#supported-aggregators).
+- Only dynamic and range-based partitioning are supported.
+- Set `rollup`  to `true` if and only if `metricSpec` is not empty or null.
+- You can only partition on string dimensions. However, multi-valued string 
dimensions are not supported.
+- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use 
`maxRowsPerSegment` instead.
+- Segments can only be sorted on `__time` as the first column.
\ No newline at end of file
diff --git a/website/.spelling b/website/.spelling
index 8175755f804..6e204b8be8d 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -410,6 +410,7 @@ maxNumSegments
 max_map_count
 memcached
 mergeable
+mergeability
 metadata
 metastores
 millis


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(druid) branch 31.0.0 updated: [Backport]docs: backport msq autocompact docs [#16681] (#17374)

Reply via email to