Re: [PR] docs: msq autocompaction (druid)

via GitHub Fri, 27 Sep 2024 10:52:29 -0700


317brian commented on code in PR #16681:
URL: https://github.com/apache/druid/pull/16681#discussion_r1778962344



##########
docs/data-management/automatic-compaction.md:
##########
@@ -188,6 +163,108 @@ druid.coordinator.compaction.duties=["compactSegments"]
 druid.coordinator.compaction.period=PT60S
 ```
 
+## Use Overlord-based automatic compaction
+
+When you use the Overlord for automatic compaction, Druid uses a supervisor 
task on the Overlord to perform the compaction. Since it's a supervisor task, 
automatic compaction using the Overlord can run frequently while providing 
faster compaction times as well as better memory tuning and usage. 
+
+When you use Overlord-based automatic compaction, you can use either the 
native engine like Coordinator-based automatic compaction or the [MSQ task 
engine](#use-msq-for-automatic-compaction). 
+
+By default, Druid checks every 5 seconds to see whether or not compaction is 
required. 
+
+### Use MSQ for automatic compaction
+
+The MSQ task engine is available as a compaction engine if you configure 
compaction tasks to run on the Overlord as a supervisor. To use the MSQ task 
engine for automatic compaction, make sure the following requirements are met:
+
+* Have the [MSQ  task engine extension 
loaded](../multi-stage-query/index.md#load-the-extension).
+* In your Overlord runtime properties, set the following properties:
+  *  `druid.supervisor.compaction.enabled` to `true` so that compaction tasks 
can be run as a supervisor task
+  *  `druid.supervisor.compaction.defaultEngine` to `msq` to specify the MSQ 
task engine as the compaction engine
+* Have at least two compaction task slots available or set 
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine 
requires at least two tasks to run, one controller task and one worker task.
+
+You can use [MSQ task engine context parameters](../multi-stage-query/) in 
`compactionConfig.taskContext` when configuring your datasource for automatic 
compaction, such as setting the maximum number of tasks using the 
`compactionConfig.taskContext.maxNumTasks` parameter. Some of the MSQ task 
engine context parameters overlap with automatic compaction parameters. When 
these settings overlap, set one or the other.
+
+To submit an automatic compaction task, you submit a supervisor spec through 
the UI or API with the type `autocompact` and the `spec` where you define the 
compaction behavior using the [automatic compaction 
syntax](#automatic-compaction-syntax). You can use the [web 
console](#ui-for-overlord-based-compaction)
+
+### UI for Overlord-based compaction
+
+To submit a supervisor spec for MSQ task engine autocompaction, perform the 
following steps:
+
+1. In the web console, go to the **Supervisors** tab.
+1. Click **...** > **Submit JSON supervisor**.
+1. In the dialog, include the following:
+     - The type of supervisor spec by setting `"type": "autocompact"`
+     - The compaction configuration by adding it to the `spec` field
+    ```json
+    {
+   "type": "autocompact",
+   "spec": {
+      "dataSource": YOUR_DATASOURCE,
+    ...
+    ...
+   }
+    ```
+1. Submit the supervisor.
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### API for Overlord-based compaction
+
+Submitting an automatic compaction as a supervisor task uses the same endpoint 
as supervisor tasks for streaming ingestion.
+
+The following example configures auto-compaction for the `wikipedia` 
datasource:
+
+```sh
+curl --location --request POST 
'http://localhost:8081/druid/indexer/v1/supervisor' \
+--header 'Content-Type: application/json' \
+--data-raw '{
+   "type": "autocompact",    // required
+   "suspended": false,         // optional
+   "spec": {                           // required
+       "dataSource": "wikipedia",          // required
+       "tuningConfig": {...},                    // optional
+       "granularitySpec": {...},               // optional
+       ...
+   }
+}'
+```
+
+To stop the automatic compaction task, suspend or terminate the supervisor 
through the UI or API.
+
+### MSQ task engine limitations
+
+When using the MSQ task engine for auto-compaction, keep the following 
limitations in mind:
+
+- The `metricSpec` field is only supported for idempotent aggregators. For 
more information, see [Idempotent aggregators](#idempotent-aggregators).
+- Only dynamic and range-based partitioning are supported
+- Set `rollup`  to `true` if `metricSpec` is not empty or null. If 
`metricSpec` is empty or null, set `rollup` to `false`.
+- You cannot group on multi-value dimensions
+- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use 
`maxRowsPerSegment` instead.
+
+#### Idempotent aggregators
+
+Idempotent aggregators are aggregators that can be applied repeatedly on a 
column and each run produces the same results, such as the following `longSum` 
aggregator:
+
+```
+{"name": "added", "type": "longSum", "fieldName": "added"}
+```
+
+where the input and output column are both `added`.
+
+The following are some examples of non-idempotent aggregators where each run 
of the aggregator produces different results:
+
+*  `longSum` aggregator where the `added` column rolls up into the `sum_added` 
column:
+    ```
+    {"name": "sum_added", "type": "longSum", "fieldName": "added" }
+    ```
+* Partial sketches:
+    ```
+    {"name": added, "type":"", fieldName: added}

Review Comment:
   Oops, not intentional. Should it be `HLLSketchMerge`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: msq autocompaction (druid)

Reply via email to