Re: [PR] docs: msq autocompaction (druid)

via GitHub Fri, 23 Aug 2024 04:51:40 -0700


gargvishesh commented on code in PR #16681:
URL: https://github.com/apache/druid/pull/16681#discussion_r1716433040



##########
docs/data-management/automatic-compaction.md:
##########
@@ -131,6 +131,52 @@ maximize performance and minimize disk usage of the 
`compact` tasks launched by
 
 For more details on each of the specs in an auto-compaction configuration, see 
[Automatic compaction dynamic 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
 
+### Compaction engine
+
+When you configure automatic compaction, you can specify whether Druid uses 
the native engine or the multi-stage query (MSQ) task engine to perform the 
compaction.  The native engine was the only engine available for compaction 
prior to the introduction of the MSQ task engine and the corresponding `engine` 
context parameter. 
+
+Using the MSQ task engine for compaction provides faster compaction times as 
well as better memory tuning and usage. For more information about the MSQ task 
engine, see [MSQ task engine concepts](../multi-stage-query/concepts.md).
+
+To use the native compaction engine, either omit the `engine` config when 
submitting your compaction task spec or set it to `native`.
+
+To use the MSQ task engine for automatic compaction, do the following:
+
+* Have the [MSQ  task engine extension 
loaded](../multi-stage-query/index.md#load-the-extension).
+* In the compaction task spec for a datasource, set `compactionConfigs.engine` 
to `msq`. The default is `native`.
+* Have at least two compaction task slots available or set 
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine 
requires at least two tasks to run, one controller task and one worker task.
+
+Keep the following limitations in mind MSQ task engine for auto-compaction:
+
+<!--Duplicated in multi-stage-query/known-issues.md-->
+
+- Only range-based partitioning is supported
+- You cannot group or roll up metrics for dimensions 
+- You cannot group on multi-value dimensions
+- The `maxTotalRows` config is not supported. Use `maxRowsPerSegment` instead.
+- `queryGranularity` cannot be set to `all`

Review Comment:
   Some things have changed now. We can use these (or something similar -- esp 
for the 1st point below) instead:
   
   * `metricsSpec` in compaction config only supported if it has idempotent 
aggregators, i.e. aggregators that can be repeatedly applied on the same column 
to produce correct results. E.g. 
   `{"name": "added", "type": "longSum", "fieldName": "added"}` is idempotent
   but 
   `{"name": "sum_added", "type": "longSum", "fieldName": "added" }` (rolls up 
`added` column to a different `sum_added` column), 
   `{"name": added, "type":"", fieldName: added}` (partial sketches can be 
merged only with HLLSketchMergeAggregatorFactory)
   `{"name": "count", "type": "count"}` (rolls up to a different `count` column)
   aren't.
   * Only dynamic and range-based partitioning are supported.
   * `rollup` should be set to `true` only if `metricsSpec` is specified and 
`false` if `metricsSpec` is empty or `null`. if `rollup` is set to `null`, all 
existing segments to be compacted are analyzed, and rollup is done only if all 
of them have rollup set to `true`.
   * The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use 
`maxRowsPerSegment` instead.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: msq autocompaction (druid)

Reply via email to