kfaraz commented on code in PR #16681: URL: https://github.com/apache/druid/pull/16681#discussion_r1777057486
########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). -At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. -When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. -If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. +You can specify whether Druid uses the native engine on the Coordinator or the multi-stage query (MSQ) task engine or native engine on the Overlord. Using the Overlord and MSQ task engine for compaction provides faster compaction times as well as better memory tuning and usage. Both methods use the same syntax, but you use different methods to submit the automatic compaction. Review Comment: Maybe this passage shouldn't be here. MSQ auto-compaction and compaction supervisors (i.e. Overlord-based compaction) are experimental at the moment. So we want to continue advising users to set up compaction via coordinator duties. This passage can be modified a little and moved into the `Overlord-based compaction` section. ########## docs/ingestion/supervisor.md: ########## @@ -23,22 +23,22 @@ sidebar_label: Supervisor ~ under the License. --> -A supervisor manages streaming ingestion from external streaming sources into Apache Druid. -Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. +Apache Druid uses supervisors to manage streaming ingestion from external streaming sources into Druid. +Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. They can also be used to for [automatic compaction](../data-management/automatic-compaction.md) after data is ingested. This topic uses the Apache Kafka term offset to refer to the identifier for records in a partition. If you are using Amazon Kinesis, the equivalent is sequence number. ## Supervisor spec -Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks. -The supervisor spec specifies how Druid should consume, process, and index streaming data. +Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks or compaction tasks. +The supervisor spec specifies how Druid should consume, process, and index streaming data or automatic compaction tasks. The following table outlines the high-level configuration options for a supervisor spec: |Property|Type|Description|Required| |--------|----|-----------|--------| -|`type`|String|The supervisor type. One of `kafka`or `kinesis`.|Yes| -|`spec`|Object|The container object for the supervisor configuration.|Yes| +|`type`|String|The supervisor type. For streaming ingestion, either `kafka`or `kinesis`. For automatic compaction, set the type to `autocompact`. |Yes| +|`spec`|Object|The container object for the supervisor configuration. For automatic compaction, this is your compaction configuration. |Yes| Review Comment: ```suggestion |`spec`|Object|The container object for the supervisor configuration. For automatic compaction, this is the same as the compaction configuration. |Yes| ``` ########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). -At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. -When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. -If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. +You can specify whether Druid uses the native engine on the Coordinator or the multi-stage query (MSQ) task engine or native engine on the Overlord. Using the Overlord and MSQ task engine for compaction provides faster compaction times as well as better memory tuning and usage. Both methods use the same syntax, but you use different methods to submit the automatic compaction. :::info Auto-compaction skips datasources that have a segment granularity of `ALL`. ::: As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases. +This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. -## Enable automatic compaction +## Coordinator-based Review Comment: ```suggestion ## Enable automatic compaction using Coordinator duties ``` ########## docs/ingestion/supervisor.md: ########## @@ -23,22 +23,22 @@ sidebar_label: Supervisor ~ under the License. --> -A supervisor manages streaming ingestion from external streaming sources into Apache Druid. -Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. +Apache Druid uses supervisors to manage streaming ingestion from external streaming sources into Druid. +Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. They can also be used to for [automatic compaction](../data-management/automatic-compaction.md) after data is ingested. This topic uses the Apache Kafka term offset to refer to the identifier for records in a partition. If you are using Amazon Kinesis, the equivalent is sequence number. ## Supervisor spec -Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks. -The supervisor spec specifies how Druid should consume, process, and index streaming data. +Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks or compaction tasks. Review Comment: ```suggestion Druid uses a JSON specification, often referred to as the supervisor spec, to define tasks used for streaming ingestion or auto-compaction. ``` ########## docs/ingestion/supervisor.md: ########## @@ -23,22 +23,22 @@ sidebar_label: Supervisor ~ under the License. --> -A supervisor manages streaming ingestion from external streaming sources into Apache Druid. -Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. +Apache Druid uses supervisors to manage streaming ingestion from external streaming sources into Druid. +Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. They can also be used to for [automatic compaction](../data-management/automatic-compaction.md) after data is ingested. This topic uses the Apache Kafka term offset to refer to the identifier for records in a partition. If you are using Amazon Kinesis, the equivalent is sequence number. ## Supervisor spec -Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks. -The supervisor spec specifies how Druid should consume, process, and index streaming data. +Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks or compaction tasks. +The supervisor spec specifies how Druid should consume, process, and index streaming data or automatic compaction tasks. The following table outlines the high-level configuration options for a supervisor spec: |Property|Type|Description|Required| |--------|----|-----------|--------| -|`type`|String|The supervisor type. One of `kafka`or `kinesis`.|Yes| -|`spec`|Object|The container object for the supervisor configuration.|Yes| +|`type`|String|The supervisor type. For streaming ingestion, either `kafka`or `kinesis`. For automatic compaction, set the type to `autocompact`. |Yes| Review Comment: ```suggestion |`type`|String|The supervisor type. For streaming ingestion, this can be either `kafka`, `kinesis` or `rabbit`. For automatic compaction, set the type to `autocompact`. |Yes| ``` ########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). -At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. -When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. -If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. +You can specify whether Druid uses the native engine on the Coordinator or the multi-stage query (MSQ) task engine or native engine on the Overlord. Using the Overlord and MSQ task engine for compaction provides faster compaction times as well as better memory tuning and usage. Both methods use the same syntax, but you use different methods to submit the automatic compaction. :::info Auto-compaction skips datasources that have a segment granularity of `ALL`. ::: As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases. +This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. -## Enable automatic compaction +## Coordinator-based -You can enable automatic compaction for a datasource using the web console or programmatically via an API. -This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../api-reference/tasks-api.md). +The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. +The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. +This time period affects other Coordinator duties including merge and conversion tasks. +To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#compaction-frequency). -### Web console +At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. +When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. +If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. -Use the web console to enable automatic compaction for a datasource as follows. +No additional configuration is needed to run automatic compaction tasks using the Coordinator and native engine. This is the default behavior for Druid. -1. Click **Datasources** in the top-level navigation. -2. In the **Compaction** column, click the edit icon for the datasource to compact. -3. In the **Compaction config** dialog, configure the auto-compaction settings. The dialog offers a form view as well as a JSON view. Editing the form updates the JSON specification, and editing the JSON updates the form field, if present. Form fields not present in the JSON indicate default values. You may add additional properties to the JSON for auto-compaction settings not displayed in the form. See [Configure automatic compaction](#configure-automatic-compaction) for supported settings for auto-compaction. -4. Click **Submit**. -5. Refresh the **Datasources** view. The **Compaction** column for the datasource changes from “Not enabled” to “Awaiting first run.” +## Overlord-based -The following screenshot shows the compaction config dialog for a datasource with auto-compaction enabled. - +You can run automatic compaction using the Overlord rather than the Coordinator. Running compaction tasks on the Overlord means that polling the task status and running compaction at a higher frequency is more efficient than a comparable compaction task that runs on the Coordinator. When running compaction tasks using the Overlord, Druid checks to see if there is data to compact in a datasource every 5 seconds. -To disable auto-compaction for a datasource, click **Delete** from the **Compaction config** dialog. Druid does not retain your auto-compaction configuration. +* In your Overlord runtime properties, set the following properties: + * `druid.supervisor.compaction.enabled` to `true` so that compaction tasks can be run as a supervisor task + * `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ task engine as the compaction engine or to `native`. -### Compaction configuration API +After making these changes, you can submit automatic compaction tasks as supervisors. For more general information about supervisors, see [Supervisors](../ingestion/supervisor.md). -Use the [Automatic compaction API](../api-reference/automatic-compaction-api.md#manage-automatic-compaction) to configure automatic compaction. -To enable auto-compaction for a datasource, create a JSON object with the desired auto-compaction settings. -See [Configure automatic compaction](#configure-automatic-compaction) for the syntax of an auto-compaction spec. -Send the JSON object as a payload in a [`POST` request](../api-reference/automatic-compaction-api.md#create-or-update-automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction`. -The following example configures auto-compaction for the `wikipedia` datasource: - -```sh -curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/config/compaction' \ ---header 'Content-Type: application/json' \ ---data-raw '{ - "dataSource": "wikipedia", - "granularitySpec": { - "segmentGranularity": "DAY" - } -}' -``` - -To disable auto-compaction for a datasource, send a [`DELETE` request](../api-reference/automatic-compaction-api.md#remove-automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` with the name of the datasource for which to disable auto-compaction. For example: - -```sh -curl --location --request DELETE 'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia' -``` - -## Configure automatic compaction +## Automatic compaction syntax You can configure automatic compaction dynamically without restarting Druid. -The automatic compaction system uses the following syntax: +Both the native and MSQ task engine automatic compaction engines use the following syntax: Review Comment: These passages probably don't need to refer to Overlord-based compaction and/or the MSQ engine. For the most part, we should leave the current docs untouched and just add a section about Overlord-based compaction at the end and call out the similarities / differences there. ########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). Review Comment: It would also be good to have a sentence that refers to manual compaction to distinguish between auto and manual. ########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). Review Comment: We can clarify whether it's the Coordinator or the Overlord later in the docs. ```suggestion In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks issued by Druid itself. ``` ########## docs/ingestion/supervisor.md: ########## @@ -23,22 +23,22 @@ sidebar_label: Supervisor ~ under the License. --> -A supervisor manages streaming ingestion from external streaming sources into Apache Druid. -Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. +Apache Druid uses supervisors to manage streaming ingestion from external streaming sources into Druid. +Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. They can also be used to for [automatic compaction](../data-management/automatic-compaction.md) after data is ingested. This topic uses the Apache Kafka term offset to refer to the identifier for records in a partition. If you are using Amazon Kinesis, the equivalent is sequence number. ## Supervisor spec -Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks. -The supervisor spec specifies how Druid should consume, process, and index streaming data. +Druid uses a JSON specification, often referred to as the supervisor spec, to define streaming ingestion tasks or compaction tasks. +The supervisor spec specifies how Druid should consume, process, and index streaming data or automatic compaction tasks. Review Comment: ```suggestion The supervisor spec specifies how Druid should consume, process, and index data from an external stream or Druid itself. ``` ########## docs/ingestion/supervisor.md: ########## @@ -23,22 +23,22 @@ sidebar_label: Supervisor ~ under the License. --> -A supervisor manages streaming ingestion from external streaming sources into Apache Druid. -Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. +Apache Druid uses supervisors to manage streaming ingestion from external streaming sources into Druid. +Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. They can also be used to for [automatic compaction](../data-management/automatic-compaction.md) after data is ingested. Review Comment: ```suggestion Supervisors oversee the state of indexing tasks to coordinate handoffs, manage failures, and ensure that the scalability and replication requirements are maintained. They can also be used to perform [automatic compaction](../data-management/automatic-compaction.md) after data has been ingested. ``` ########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). -At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. -When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. -If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. +You can specify whether Druid uses the native engine on the Coordinator or the multi-stage query (MSQ) task engine or native engine on the Overlord. Using the Overlord and MSQ task engine for compaction provides faster compaction times as well as better memory tuning and usage. Both methods use the same syntax, but you use different methods to submit the automatic compaction. :::info Auto-compaction skips datasources that have a segment granularity of `ALL`. ::: As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases. +This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. -## Enable automatic compaction +## Coordinator-based -You can enable automatic compaction for a datasource using the web console or programmatically via an API. -This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../api-reference/tasks-api.md). +The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. +The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. +This time period affects other Coordinator duties including merge and conversion tasks. +To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#compaction-frequency). -### Web console +At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. +When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. +If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. -Use the web console to enable automatic compaction for a datasource as follows. +No additional configuration is needed to run automatic compaction tasks using the Coordinator and native engine. This is the default behavior for Druid. -1. Click **Datasources** in the top-level navigation. -2. In the **Compaction** column, click the edit icon for the datasource to compact. -3. In the **Compaction config** dialog, configure the auto-compaction settings. The dialog offers a form view as well as a JSON view. Editing the form updates the JSON specification, and editing the JSON updates the form field, if present. Form fields not present in the JSON indicate default values. You may add additional properties to the JSON for auto-compaction settings not displayed in the form. See [Configure automatic compaction](#configure-automatic-compaction) for supported settings for auto-compaction. -4. Click **Submit**. -5. Refresh the **Datasources** view. The **Compaction** column for the datasource changes from “Not enabled” to “Awaiting first run.” +## Overlord-based Review Comment: All of the Overlord based stuff should be moved to the end of this page and must be called out as experimental. ########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). -At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. -When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. -If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. +You can specify whether Druid uses the native engine on the Coordinator or the multi-stage query (MSQ) task engine or native engine on the Overlord. Using the Overlord and MSQ task engine for compaction provides faster compaction times as well as better memory tuning and usage. Both methods use the same syntax, but you use different methods to submit the automatic compaction. :::info Auto-compaction skips datasources that have a segment granularity of `ALL`. ::: As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases. +This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. -## Enable automatic compaction +## Coordinator-based Review Comment: There can be an info tooltip inside this section saying that `Coordinator-based compaction does not support compaction using the MSQ engine`. ########## docs/data-management/automatic-compaction.md: ########## @@ -22,76 +22,45 @@ title: "Automatic compaction" ~ under the License. --> -In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md). -This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. - -## How Druid manages automatic compaction - -The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. -The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. -This time period affects other Coordinator duties including merge and conversion tasks. -To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs). +In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md) or the [Overlord](../design/overlord.md). -At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact. -When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity. -If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search. +You can specify whether Druid uses the native engine on the Coordinator or the multi-stage query (MSQ) task engine or native engine on the Overlord. Using the Overlord and MSQ task engine for compaction provides faster compaction times as well as better memory tuning and usage. Both methods use the same syntax, but you use different methods to submit the automatic compaction. :::info Auto-compaction skips datasources that have a segment granularity of `ALL`. ::: As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases. +This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction. -## Enable automatic compaction +## Coordinator-based -You can enable automatic compaction for a datasource using the web console or programmatically via an API. -This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../api-reference/tasks-api.md). +The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks. +The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled. +This time period affects other Coordinator duties including merge and conversion tasks. Review Comment: ```suggestion This time period also affects other Coordinator duties such as cleanup of unused segments and stale pending segments. ``` Merge and conversion tasks do not exist in Druid anymore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
