gargvishesh commented on code in PR #16681:
URL: https://github.com/apache/druid/pull/16681#discussion_r1716366537
##########
docs/multi-stage-query/known-issues.md:
##########
@@ -68,3 +68,17 @@ properties, and the `indexSpec`
[`tuningConfig`](../ingestion/ingestion-spec.md#
- The maximum number of elements in a window cannot exceed a value of 100,000.
- To avoid `leafOperators` in MSQ engine, window functions have an extra scan
stage after the window stage for cases
where native engine has a non-empty `leafOperator`.
+
+## Automatic compaction
+
+<!--This list also exists in data-management/automatic-compaction-->
+
+The following known issues and limitations affect automatic compaction with
the MSQ task engine:
+
+- Only range-based partitioning is supported
+- You cannot group or roll up metrics for dimensions
+- You cannot group on multi-value dimensions
+- The `maxTotalRows` config is not supported. Use `maxRowsPerSegment` instead.
+- `queryGranularity` cannot be set to `all`
Review Comment:
The `queryGranularity` limitation can be removed.
##########
docs/data-management/automatic-compaction.md:
##########
@@ -131,6 +131,52 @@ maximize performance and minimize disk usage of the
`compact` tasks launched by
For more details on each of the specs in an auto-compaction configuration, see
[Automatic compaction dynamic
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
+### Compaction engine
+
+When you configure automatic compaction, you can specify whether Druid uses
the native engine or the multi-stage query (MSQ) task engine to perform the
compaction. The native engine was the only engine available for compaction
prior to the introduction of the MSQ task engine and the corresponding `engine`
context parameter.
+
+Using the MSQ task engine for compaction provides faster compaction times as
well as better memory tuning and usage. For more information about the MSQ task
engine, see [MSQ task engine concepts](../multi-stage-query/concepts.md).
+
+To use the native compaction engine, either omit the `engine` config when
submitting your compaction task spec or set it to `native`.
+
+To use the MSQ task engine for automatic compaction, do the following:
+
+* Have the [MSQ task engine extension
loaded](../multi-stage-query/index.md#load-the-extension).
+* In the compaction task spec for a datasource, set `compactionConfigs.engine`
to `msq`. The default is `native`.
+* Have at least two compaction task slots available or set
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine
requires at least two tasks to run, one controller task and one worker task.
+
+Keep the following limitations in mind MSQ task engine for auto-compaction:
+
+<!--Duplicated in multi-stage-query/known-issues.md-->
+
+- Only range-based partitioning is supported
+- You cannot group or roll up metrics for dimensions
+- You cannot group on multi-value dimensions
+- The `maxTotalRows` config is not supported. Use `maxRowsPerSegment` instead.
+- `queryGranularity` cannot be set to `all`
Review Comment:
Some things have changed now. We can use these (or something similar -- esp
for the 1st point below) instead:
* `metricsSpec` in compaction config only supported if it has idempotent
aggregators, i.e. aggregators that can be repeatedly applied on the same column
to produce correct results. E.g.
`{"name": "added", "type": "longSum", "fieldName": "added"}` is idempotent
but
`{"name": "sum_added", "type": "longSum", "fieldName": "added" }` (rolls up
`added` column to a different `sum_added` column),
`{"name": added, "type":"", fieldName: added}` (partial sketches can be
merged only with HLLSketchMergeAggregatorFactory)
`{"name": "count", "type": "count"}` (rolls up to a different `count` column)
aren't.
* Only dynamic and range-based partitioning are supported.
* `rollup` should be set to `true` if and only if `metricsSpec` is specified
* The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use
`maxRowsPerSegment` instead.
##########
docs/data-management/automatic-compaction.md:
##########
@@ -131,6 +131,52 @@ maximize performance and minimize disk usage of the
`compact` tasks launched by
For more details on each of the specs in an auto-compaction configuration, see
[Automatic compaction dynamic
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
+### Compaction engine
+
+When you configure automatic compaction, you can specify whether Druid uses
the native engine or the multi-stage query (MSQ) task engine to perform the
compaction. The native engine was the only engine available for compaction
prior to the introduction of the MSQ task engine and the corresponding `engine`
context parameter.
+
+Using the MSQ task engine for compaction provides faster compaction times as
well as better memory tuning and usage. For more information about the MSQ task
engine, see [MSQ task engine concepts](../multi-stage-query/concepts.md).
+
+To use the native compaction engine, either omit the `engine` config when
submitting your compaction task spec or set it to `native`.
+
+To use the MSQ task engine for automatic compaction, do the following:
+
+* Have the [MSQ task engine extension
loaded](../multi-stage-query/index.md#load-the-extension).
+* In the compaction task spec for a datasource, set `compactionConfigs.engine`
to `msq`. The default is `native`.
+* Have at least two compaction task slots available or set
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine
requires at least two tasks to run, one controller task and one worker task.
+
+Keep the following limitations in mind MSQ task engine for auto-compaction:
+
+<!--Duplicated in multi-stage-query/known-issues.md-->
+
+- Only range-based partitioning is supported
+- You cannot group or roll up metrics for dimensions
+- You cannot group on multi-value dimensions
+- The `maxTotalRows` config is not supported. Use `maxRowsPerSegment` instead.
+- `queryGranularity` cannot be set to `all`
+
+#### MSQ task engine context parameters
+
+You can use [MSQ task engine context parameters](../multi-stage-query/) in
`compactionConfig.taskContext` when configuring your datasource for automatic
compaction, such as setting the maximum number of tasks using the
`compactionConfig.taskContext.maxNumTasks` parameter. Some of the MSQ task
engine context parameters overlap with automatic compaction parameters. When
these settings overlap, set one or the other.
+
+The following table has the MSQ task engine context parameter first with the
native context parameter in parenthesis:
+
+| MSQ task engine context parameter | Automatic compaction config |
+|--------------------------------------------|---------------------------------------------|
+| `context.priority` | `taskPriority`
|
+| `context.rowsPerSegment` |
`tuningConfig.targetRowsPerSegment` |
+| `context.priority` | `taskContext.priority`
|
+| `context.storeCompactionState` |
`taskContext.storeCompactionState` |
+| `sqlQueryContext.sqlInsertSegmentGranularity` |
`granularitySpec.segmentGranularity` |
+| `spec.query.dataSource` or `dataSource` | `dataSource`
|
+| `spec.tuningConfig.indexSpec` | `tuningConfig.indexSpec`
|
+| `spec.query.orederBy` | `tuningConfig.indexSpec`
|
+| `spec.query.granularity` |
`granularitySpec.queryGranularity` |
+| `spec.query.dimensions` | `dimensionsSpec`
|
+| `spec.query.filter` | `transformSpec.filter`
|
+| `spec.query.aggregations` | `metricsSpec`
|
+
+
Review Comment:
This in an internal detail of how the full compaction config translates to
an MSQ task spec and immaterial to the user, so can be skipped.
##########
docs/data-management/automatic-compaction.md:
##########
@@ -131,6 +131,52 @@ maximize performance and minimize disk usage of the
`compact` tasks launched by
For more details on each of the specs in an auto-compaction configuration, see
[Automatic compaction dynamic
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
+### Compaction engine
+
+When you configure automatic compaction, you can specify whether Druid uses
the native engine or the multi-stage query (MSQ) task engine to perform the
compaction. The native engine was the only engine available for compaction
prior to the introduction of the MSQ task engine and the corresponding `engine`
context parameter.
+
+Using the MSQ task engine for compaction provides faster compaction times as
well as better memory tuning and usage. For more information about the MSQ task
engine, see [MSQ task engine concepts](../multi-stage-query/concepts.md).
+
+To use the native compaction engine, either omit the `engine` config when
submitting your compaction task spec or set it to `native`.
+
+To use the MSQ task engine for automatic compaction, do the following:
+
+* Have the [MSQ task engine extension
loaded](../multi-stage-query/index.md#load-the-extension).
+* In the compaction task spec for a datasource, set `compactionConfigs.engine`
to `msq`. The default is `native`.
+* Have at least two compaction task slots available or set
`compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine
requires at least two tasks to run, one controller task and one worker task.
+
+Keep the following limitations in mind MSQ task engine for auto-compaction:
Review Comment:
```suggestion
Keep the following limitations in mind when using MSQ task engine for
auto-compaction:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]