Re: [PR] docs: msq autocompaction (druid)

via GitHub Thu, 26 Sep 2024 07:37:11 -0700


kfaraz commented on code in PR #16681:
URL: https://github.com/apache/druid/pull/16681#discussion_r1777057486



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
 
-At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the 
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the 
Overlord and MSQ task engine for compaction provides faster compaction times as 
well as better memory tuning and usage. Both methods use the same syntax, but 
you use different methods to submit the automatic compaction.

Review Comment:
   Maybe this passage shouldn't be here. MSQ auto-compaction and compaction 
supervisors (i.e. Overlord-based compaction) are experimental at the moment. So 
we want to continue advising users to set up compaction via coordinator duties.
   
   This passage can be modified a little and moved into the `Overlord-based 
compaction` section.



##########
docs/ingestion/supervisor.md:
##########
@@ -23,22 +23,22 @@ sidebar_label: Supervisor
   ~ under the License.
   -->
 
-A supervisor manages streaming ingestion from external streaming sources into 
Apache Druid.
-Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained.
+Apache Druid uses supervisors to manage streaming ingestion from external 
streaming sources into Druid.
+Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained. They can also be used to for [automatic 
compaction](../data-management/automatic-compaction.md) after data is ingested. 
 
 This topic uses the Apache Kafka term offset to refer to the identifier for 
records in a partition. If you are using Amazon Kinesis, the equivalent is 
sequence number.
 
 ## Supervisor spec
 
-Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks.
-The supervisor spec specifies how Druid should consume, process, and index 
streaming data.
+Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks or compaction tasks.
+The supervisor spec specifies how Druid should consume, process, and index 
streaming data or automatic compaction tasks.
 
 The following table outlines the high-level configuration options for a 
supervisor spec:
 
 |Property|Type|Description|Required|
 |--------|----|-----------|--------|
-|`type`|String|The supervisor type. One of `kafka`or `kinesis`.|Yes|
-|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`type`|String|The supervisor type. For streaming ingestion, either `kafka`or 
`kinesis`. For automatic compaction, set the type to `autocompact`. |Yes|
+|`spec`|Object|The container object for the supervisor configuration. For 
automatic compaction, this is your compaction configuration. |Yes|

Review Comment:
   ```suggestion
   |`spec`|Object|The container object for the supervisor configuration. For 
automatic compaction, this is the same as the compaction configuration. |Yes|
   ```



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
 
-At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the 
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the 
Overlord and MSQ task engine for compaction provides faster compaction times as 
well as better memory tuning and usage. Both methods use the same syntax, but 
you use different methods to submit the automatic compaction.
 
 :::info
  Auto-compaction skips datasources that have a segment granularity of `ALL`.
 :::
 
 As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
 
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
 
-## Enable automatic compaction
+## Coordinator-based

Review Comment:
   ```suggestion
   ## Enable automatic compaction using Coordinator duties
   ```



##########
docs/ingestion/supervisor.md:
##########
@@ -23,22 +23,22 @@ sidebar_label: Supervisor
   ~ under the License.
   -->
 
-A supervisor manages streaming ingestion from external streaming sources into 
Apache Druid.
-Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained.
+Apache Druid uses supervisors to manage streaming ingestion from external 
streaming sources into Druid.
+Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained. They can also be used to for [automatic 
compaction](../data-management/automatic-compaction.md) after data is ingested. 
 
 This topic uses the Apache Kafka term offset to refer to the identifier for 
records in a partition. If you are using Amazon Kinesis, the equivalent is 
sequence number.
 
 ## Supervisor spec
 
-Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks.
-The supervisor spec specifies how Druid should consume, process, and index 
streaming data.
+Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks or compaction tasks.

Review Comment:
   ```suggestion
   Druid uses a JSON specification, often referred to as the supervisor spec, 
to define tasks used for streaming ingestion or auto-compaction.
   ```



##########
docs/ingestion/supervisor.md:
##########
@@ -23,22 +23,22 @@ sidebar_label: Supervisor
   ~ under the License.
   -->
 
-A supervisor manages streaming ingestion from external streaming sources into 
Apache Druid.
-Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained.
+Apache Druid uses supervisors to manage streaming ingestion from external 
streaming sources into Druid.
+Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained. They can also be used to for [automatic 
compaction](../data-management/automatic-compaction.md) after data is ingested. 
 
 This topic uses the Apache Kafka term offset to refer to the identifier for 
records in a partition. If you are using Amazon Kinesis, the equivalent is 
sequence number.
 
 ## Supervisor spec
 
-Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks.
-The supervisor spec specifies how Druid should consume, process, and index 
streaming data.
+Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks or compaction tasks.
+The supervisor spec specifies how Druid should consume, process, and index 
streaming data or automatic compaction tasks.
 
 The following table outlines the high-level configuration options for a 
supervisor spec:
 
 |Property|Type|Description|Required|
 |--------|----|-----------|--------|
-|`type`|String|The supervisor type. One of `kafka`or `kinesis`.|Yes|
-|`spec`|Object|The container object for the supervisor configuration.|Yes|
+|`type`|String|The supervisor type. For streaming ingestion, either `kafka`or 
`kinesis`. For automatic compaction, set the type to `autocompact`. |Yes|

Review Comment:
   ```suggestion
   |`type`|String|The supervisor type. For streaming ingestion, this can be 
either `kafka`, `kinesis` or `rabbit`. For automatic compaction, set the type 
to `autocompact`. |Yes|
   ```



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
 
-At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the 
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the 
Overlord and MSQ task engine for compaction provides faster compaction times as 
well as better memory tuning and usage. Both methods use the same syntax, but 
you use different methods to submit the automatic compaction.
 
 :::info
  Auto-compaction skips datasources that have a segment granularity of `ALL`.
 :::
 
 As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
 
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
 
-## Enable automatic compaction
+## Coordinator-based
 
-You can enable automatic compaction for a datasource using the web console or 
programmatically via an API.
-This process differs for manual compaction tasks, which can be submitted from 
the [Tasks view of the web console](../operations/web-console.md) or the [Tasks 
API](../api-reference/tasks-api.md).
+The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+This time period affects other Coordinator duties including merge and 
conversion tasks.
+To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction runs](#compaction-frequency).
 
-### Web console
+At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
+When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
+If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
 
-Use the web console to enable automatic compaction for a datasource as follows.
+No additional configuration is needed to run automatic compaction tasks using 
the Coordinator and native engine. This is the default behavior for Druid.
 
-1. Click **Datasources** in the top-level navigation.
-2. In the **Compaction** column, click the edit icon for the datasource to 
compact.
-3. In the **Compaction config** dialog, configure the auto-compaction 
settings. The dialog offers a form view as well as a JSON view. Editing the 
form updates the JSON specification, and editing the JSON updates the form 
field, if present. Form fields not present in the JSON indicate default values. 
You may add additional properties to the JSON for auto-compaction settings not 
displayed in the form. See [Configure automatic 
compaction](#configure-automatic-compaction) for supported settings for 
auto-compaction.
-4. Click **Submit**.
-5. Refresh the **Datasources** view. The **Compaction** column for the 
datasource changes from “Not enabled” to “Awaiting first run.”
+## Overlord-based
 
-The following screenshot shows the compaction config dialog for a datasource 
with auto-compaction enabled.
-![Compaction config in web console](../assets/compaction-config.png)
+You can run automatic compaction using the Overlord rather than the 
Coordinator. Running compaction tasks on the Overlord means that polling the 
task status and running compaction at a higher frequency is more efficient than 
a comparable compaction task that runs on the Coordinator. When running 
compaction tasks using the Overlord, Druid checks to see if there is data to 
compact in a datasource every 5 seconds.
 
-To disable auto-compaction for a datasource, click **Delete** from the 
**Compaction config** dialog. Druid does not retain your auto-compaction 
configuration.
+* In your Overlord runtime properties, set the following properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor task
+  *  `druid.supervisor.compaction.defaultEngine` to  `msq` to specify the MSQ 
task engine as the compaction engine or to `native`.
 
-### Compaction configuration API
+After making these changes, you can submit automatic compaction tasks as 
supervisors. For more general information about supervisors, see 
[Supervisors](../ingestion/supervisor.md).
 
-Use the [Automatic compaction 
API](../api-reference/automatic-compaction-api.md#manage-automatic-compaction) 
to configure automatic compaction.
-To enable auto-compaction for a datasource, create a JSON object with the 
desired auto-compaction settings.
-See [Configure automatic compaction](#configure-automatic-compaction) for the 
syntax of an auto-compaction spec.
-Send the JSON object as a payload in a [`POST` 
request](../api-reference/automatic-compaction-api.md#create-or-update-automatic-compaction-configuration)
 to `/druid/coordinator/v1/config/compaction`.
-The following example configures auto-compaction for the `wikipedia` 
datasource:
-
-```sh
-curl --location --request POST 
'http://localhost:8081/druid/coordinator/v1/config/compaction' \
---header 'Content-Type: application/json' \
---data-raw '{
-    "dataSource": "wikipedia",
-    "granularitySpec": {
-        "segmentGranularity": "DAY"
-    }
-}'
-```
-
-To disable auto-compaction for a datasource, send a [`DELETE` 
request](../api-reference/automatic-compaction-api.md#remove-automatic-compaction-configuration)
 to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace 
`{dataSource}` with the name of the datasource for which to disable 
auto-compaction. For example:
-
-```sh
-curl --location --request DELETE 
'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
-```
-
-## Configure automatic compaction
+## Automatic compaction syntax
 
 You can configure automatic compaction dynamically without restarting Druid.
-The automatic compaction system uses the following syntax:
+Both the native and MSQ task engine automatic compaction engines use the 
following syntax:

Review Comment:
   These passages probably don't need to refer to Overlord-based compaction 
and/or the MSQ engine. For the most part, we should leave the current docs 
untouched and just add a section about Overlord-based compaction at the end and 
call out the similarities / differences there.



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).

Review Comment:
   It would also be good to have a sentence that refers to manual compaction to 
distinguish between auto and manual.



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).

Review Comment:
   We can clarify whether it's the Coordinator or the Overlord later in the 
docs.
   
   ```suggestion
   In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks issued by Druid itself.
   ```



##########
docs/ingestion/supervisor.md:
##########
@@ -23,22 +23,22 @@ sidebar_label: Supervisor
   ~ under the License.
   -->
 
-A supervisor manages streaming ingestion from external streaming sources into 
Apache Druid.
-Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained.
+Apache Druid uses supervisors to manage streaming ingestion from external 
streaming sources into Druid.
+Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained. They can also be used to for [automatic 
compaction](../data-management/automatic-compaction.md) after data is ingested. 
 
 This topic uses the Apache Kafka term offset to refer to the identifier for 
records in a partition. If you are using Amazon Kinesis, the equivalent is 
sequence number.
 
 ## Supervisor spec
 
-Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks.
-The supervisor spec specifies how Druid should consume, process, and index 
streaming data.
+Druid uses a JSON specification, often referred to as the supervisor spec, to 
define streaming ingestion tasks or compaction tasks.
+The supervisor spec specifies how Druid should consume, process, and index 
streaming data or automatic compaction tasks.

Review Comment:
   ```suggestion
   The supervisor spec specifies how Druid should consume, process, and index 
data from an external stream or Druid itself.
   ```



##########
docs/ingestion/supervisor.md:
##########
@@ -23,22 +23,22 @@ sidebar_label: Supervisor
   ~ under the License.
   -->
 
-A supervisor manages streaming ingestion from external streaming sources into 
Apache Druid.
-Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained.
+Apache Druid uses supervisors to manage streaming ingestion from external 
streaming sources into Druid.
+Supervisors oversee the state of indexing tasks to coordinate handoffs, manage 
failures, and ensure that the scalability and replication requirements are 
maintained. They can also be used to for [automatic 
compaction](../data-management/automatic-compaction.md) after data is ingested. 

Review Comment:
   ```suggestion
   Supervisors oversee the state of indexing tasks to coordinate handoffs, 
manage failures, and ensure that the scalability and replication requirements 
are maintained. They can also be used to perform [automatic 
compaction](../data-management/automatic-compaction.md) after data has been 
ingested.
   ```



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
 
-At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the 
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the 
Overlord and MSQ task engine for compaction provides faster compaction times as 
well as better memory tuning and usage. Both methods use the same syntax, but 
you use different methods to submit the automatic compaction.
 
 :::info
  Auto-compaction skips datasources that have a segment granularity of `ALL`.
 :::
 
 As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
 
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
 
-## Enable automatic compaction
+## Coordinator-based
 
-You can enable automatic compaction for a datasource using the web console or 
programmatically via an API.
-This process differs for manual compaction tasks, which can be submitted from 
the [Tasks view of the web console](../operations/web-console.md) or the [Tasks 
API](../api-reference/tasks-api.md).
+The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+This time period affects other Coordinator duties including merge and 
conversion tasks.
+To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction runs](#compaction-frequency).
 
-### Web console
+At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
+When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
+If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
 
-Use the web console to enable automatic compaction for a datasource as follows.
+No additional configuration is needed to run automatic compaction tasks using 
the Coordinator and native engine. This is the default behavior for Druid.
 
-1. Click **Datasources** in the top-level navigation.
-2. In the **Compaction** column, click the edit icon for the datasource to 
compact.
-3. In the **Compaction config** dialog, configure the auto-compaction 
settings. The dialog offers a form view as well as a JSON view. Editing the 
form updates the JSON specification, and editing the JSON updates the form 
field, if present. Form fields not present in the JSON indicate default values. 
You may add additional properties to the JSON for auto-compaction settings not 
displayed in the form. See [Configure automatic 
compaction](#configure-automatic-compaction) for supported settings for 
auto-compaction.
-4. Click **Submit**.
-5. Refresh the **Datasources** view. The **Compaction** column for the 
datasource changes from “Not enabled” to “Awaiting first run.”
+## Overlord-based

Review Comment:
   All of the Overlord based stuff should be moved to the end of this page and 
must be called out as experimental.



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
 
-At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the 
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the 
Overlord and MSQ task engine for compaction provides faster compaction times as 
well as better memory tuning and usage. Both methods use the same syntax, but 
you use different methods to submit the automatic compaction.
 
 :::info
  Auto-compaction skips datasources that have a segment granularity of `ALL`.
 :::
 
 As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
 
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
 
-## Enable automatic compaction
+## Coordinator-based

Review Comment:
   There can be an info tooltip inside this section saying that 
`Coordinator-based compaction does not support compaction using the MSQ engine`.



##########
docs/data-management/automatic-compaction.md:
##########
@@ -22,76 +22,45 @@ title: "Automatic compaction"
   ~ under the License.
   -->
 
-In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md).
-This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
-
-## How Druid manages automatic compaction
-
-The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
-The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
-This time period affects other Coordinator duties including merge and 
conversion tasks.
-To configure the auto-compaction time period without interfering with 
`indexingPeriod`, see [Set frequency of compaction 
runs](#set-frequency-of-compaction-runs).
+In Apache Druid, compaction is a special type of ingestion task that reads 
data from a Druid datasource and writes it back into the same datasource. A 
common use case for this is to [optimally size 
segments](../operations/segment-optimization.md) after ingestion to improve 
query performance. Automatic compaction, or auto-compaction, refers to the 
system for automatic execution of compaction tasks managed by the [Druid 
Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md).
 
-At every invocation of auto-compaction, the Coordinator initiates a [segment 
search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) 
to determine eligible segments to compact.
-When there are eligible segments to compact, the Coordinator issues compaction 
tasks based on available worker capacity.
-If a compaction task takes longer than the indexing period, the Coordinator 
waits for it to finish before resuming the period for segment search.
+You can specify whether Druid uses the native engine on the Coordinator or the 
multi-stage query (MSQ) task engine or native engine on the Overlord. Using the 
Overlord and MSQ task engine for compaction provides faster compaction times as 
well as better memory tuning and usage. Both methods use the same syntax, but 
you use different methods to submit the automatic compaction.
 
 :::info
  Auto-compaction skips datasources that have a segment granularity of `ALL`.
 :::
 
 As a best practice, you should set up auto-compaction for all Druid 
datasources. You can run compaction tasks manually for cases where you want to 
allocate more system resources. For example, you may choose to run multiple 
compaction tasks in parallel to compact an existing datasource for the first 
time. See [Compaction](compaction.md) for additional details and use cases.
 
+This topic guides you through setting up automatic compaction for your Druid 
cluster. See the [examples](#examples) for common use cases for automatic 
compaction.
 
-## Enable automatic compaction
+## Coordinator-based
 
-You can enable automatic compaction for a datasource using the web console or 
programmatically via an API.
-This process differs for manual compaction tasks, which can be submitted from 
the [Tasks view of the web console](../operations/web-console.md) or the [Tasks 
API](../api-reference/tasks-api.md).
+The Coordinator [indexing 
period](../configuration/index.md#coordinator-operation), 
`druid.coordinator.period.indexingPeriod`, controls the frequency of compaction 
tasks.
+The default indexing period is 30 minutes, meaning that the Coordinator first 
checks for segments to compact at most 30 minutes from when auto-compaction is 
enabled.
+This time period affects other Coordinator duties including merge and 
conversion tasks.

Review Comment:
   ```suggestion
   This time period also affects other Coordinator duties such as cleanup of 
unused segments and stale pending segments.
   ```
   
   Merge and conversion tasks do not exist in Druid anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: msq autocompaction (druid)

Reply via email to