This is an automated email from the ASF dual-hosted git repository.
techdocsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git
The following commit(s) were added to refs/heads/master by this push:
new 0206a2da5c Update automatic compaction docs with consistent
terminology (#12416)
0206a2da5c is described below
commit 0206a2da5c6211caf451751876d847f40c2e7755
Author: Victoria Lim <[email protected]>
AuthorDate: Tue May 3 16:22:25 2022 -0700
Update automatic compaction docs with consistent terminology (#12416)
* specify automatic compaction where applicable
* Apply suggestions from code review
Co-authored-by: Katya Macedo <[email protected]>
* update for style and consistency
* implement suggested feedback
* remove duplicate example
* Apply suggestions from code review
Co-authored-by: Katya Macedo <[email protected]>
* Update docs/ingestion/compaction.md
Co-authored-by: Katya Macedo <[email protected]>
* Update docs/operations/api-reference.md
* update .spelling
* Adopt review suggestions
Co-authored-by: Katya Macedo <[email protected]>
---
docs/configuration/index.md | 40 ++++++++++++-------------
docs/design/coordinator.md | 35 +++++++++++-----------
docs/ingestion/compaction.md | 52 +++++++++++----------------------
docs/ingestion/tasks.md | 3 ++
docs/operations/api-reference.md | 42 +++++++++++++-------------
docs/operations/segment-optimization.md | 8 ++---
website/.spelling | 5 ++--
7 files changed, 84 insertions(+), 101 deletions(-)
diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index e598b36ea0..1e829bce74 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -951,14 +951,14 @@ These configuration options control the behavior of the
Lookup dynamic configura
|`druid.manager.lookups.threadPoolSize`|How many processes can be managed
concurrently (concurrent POST and DELETE requests). Requests this limit will
wait in a queue until a slot becomes available.|10|
|`druid.manager.lookups.period`|How many milliseconds between checks for
configuration changes|30_000|
-##### Compaction Dynamic Configuration
+##### Automatic compaction dynamic configuration
-Compaction configurations can also be set or updated dynamically using
-[Coordinator's API](../operations/api-reference.md#compaction-configuration)
without restarting Coordinators.
+You can set or update automatic compaction properties dynamically using the
+[Coordinator
API](../operations/api-reference.md#automatic-compaction-configuration) without
restarting Coordinators.
-For details about segment compaction, please check [Segment Size
Optimization](../operations/segment-optimization.md).
+For details about segment compaction, see [Segment size
optimization](../operations/segment-optimization.md).
-A description of the compaction config is:
+You can configure automatic compaction through the following properties:
|Property|Description|Required|
|--------|-----------|--------|
@@ -966,16 +966,16 @@ A description of the compaction config is:
|`taskPriority`|[Priority](../ingestion/tasks.md#priority) of compaction
task.|no (default = 25)|
|`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per
compaction task. Since a time chunk must be processed in its entirety, if the
segments for a particular time chunk have a total size in bytes greater than
this parameter, compaction will not run for that time chunk. Because each
compaction task runs with a single thread, setting this value too far above
1–2GB will result in compaction tasks taking an excessive amount of time.|no
(default = Long.MAX_VALUE)|
|`maxRowsPerSegment`|Max number of rows per segment after compaction.|no|
-|`skipOffsetFromLatest`|The offset for searching segments to be compacted in
[ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly
recommended to set for realtime dataSources. See [Data handling with
compaction](../ingestion/compaction.md#data-handling-with-compaction)|no
(default = "P1D")|
-|`tuningConfig`|Tuning config for compaction tasks. See below [Compaction Task
TuningConfig](#automatic-compaction-tuningconfig).|no|
+|`skipOffsetFromLatest`|The offset for searching segments to be compacted in
[ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly
recommended to set for realtime dataSources. See [Data handling with
compaction](../ingestion/compaction.md#data-handling-with-compaction).|no
(default = "P1D")|
+|`tuningConfig`|Tuning config for compaction tasks. See below [Automatic
compaction tuningConfig](#automatic-compaction-tuningconfig).|no|
|`taskContext`|[Task context](../ingestion/tasks.md#context) for compaction
tasks.|no|
-|`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction
granularitySpec](#automatic-compaction-granularityspec)|No|
-|`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction
dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
-|`transformSpec`|Custom `transformSpec`. See [Automatic compaction
transformSpec](#automatic-compaction-transform-spec)|No|
+|`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction
granularitySpec](#automatic-compaction-granularityspec).|No|
+|`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction
dimensionsSpec](#automatic-compaction-dimensionsspec).|No|
+|`transformSpec`|Custom `transformSpec`. See [Automatic compaction
transformSpec](#automatic-compaction-transformspec).|No|
|`metricsSpec`|Custom
[`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction
task preserves any existing metrics regardless of whether `metricsSpec` is
specified. If `metricsSpec` is specified, Druid does not reapply any
aggregators matching the metric names specified in `metricsSpec` to rows that
already have the associated metrics. For rows that do not already have the
metric specified in `metricsSpec`, Druid applies the metric aggregator on the
source column, then [...]
-|`ioConfig`|IO config for compaction tasks. See below [Compaction Task
IOConfig](#automatic-compaction-ioconfig).|no|
+|`ioConfig`|IO config for compaction tasks. See [Automatic compaction
ioConfig](#automatic-compaction-ioconfig).|no|
-An example of compaction config is:
+Automatic compaction config example:
```json
{
@@ -989,10 +989,10 @@ An example of compaction config is:
Compaction tasks fail when higher priority tasks cause Druid to revoke their
locks. By default, realtime tasks like ingestion have a higher priority than
compaction tasks. Therefore frequent conflicts between compaction tasks and
realtime tasks can cause the coordinator's automatic compaction to get stuck.
You may see this issue with streaming ingestion from Kafka and Kinesis, which
ingest late-arriving data. To mitigate this problem, set `skipOffsetFromLatest`
to a value large enough so that arriving data tends to fall outside the offset
value from the current time. This way you can avoid conflicts between
compaction tasks and realtime ingestion tasks.
-###### Automatic compaction TuningConfig
+###### Automatic compaction tuningConfig
-Auto compaction supports a subset of the [tuningConfig for Parallel
task](../ingestion/native-batch.md#tuningconfig).
-The below is a list of the supported configurations for auto compaction.
+Auto-compaction supports a subset of the [tuningConfig for Parallel
task](../ingestion/native-batch.md#tuningconfig).
+The below is a list of the supported configurations for auto-compaction.
|Property|Description|Required|
|--------|-----------|--------|
@@ -1022,22 +1022,22 @@ The below is a list of the supported configurations for
auto compaction.
|`queryGranularity`|The resolution of timestamp storage within each segment.
Defaults to 'null', which preserves the original query granularity. Accepts all
[Query granularity](../querying/granularities.md) values.|No|
|`rollup`|Whether to enable ingestion-time rollup or not. Defaults to 'null',
which preserves the original setting. Note that once data is rollup, individual
records can no longer be recovered. |No|
-###### Automatic compaction dimensions spec
+###### Automatic compaction dimensionsSpec
|Field|Description|Required|
|-----|-----------|--------|
|`dimensions`| A list of dimension names or objects. Defaults to 'null', which
preserves the original dimensions. Note that setting this will cause segments
manually compacted with `dimensionExclusions` to be compacted again.|No|
-###### Automatic compaction transform spec
+###### Automatic compaction transformSpec
|Field|Description|Required|
|-----|-----------|--------|
|`filter`| The `filter` conditionally filters input rows during compaction.
Only rows that pass the filter will be included in the compacted segments. Any
of Druid's standard [query filters](../querying/filters.md) can be used.
Defaults to 'null', which will not filter any row. |No|
-###### Automatic compaction IOConfig
+###### Automatic compaction ioConfig
-Auto compaction supports a subset of the [IOConfig for Parallel
task](../ingestion/native-batch.md).
-The below is a list of the supported configurations for auto compaction.
+Auto-compaction supports a subset of the [ioConfig for Parallel
task](../ingestion/native-batch.md).
+The below is a list of the supported configurations for auto-compaction.
|Property|Description|Default|Required|
|--------|-----------|-------|--------|
diff --git a/docs/design/coordinator.md b/docs/design/coordinator.md
index a3d33eca9f..44f7297bb5 100644
--- a/docs/design/coordinator.md
+++ b/docs/design/coordinator.md
@@ -79,39 +79,38 @@ If a Historical process restarts or becomes unavailable for
any reason, the Drui
To ensure an even distribution of segments across Historical processes in the
cluster, the Coordinator process will find the total size of all segments being
served by every Historical process each time the Coordinator runs. For every
Historical process tier in the cluster, the Coordinator process will determine
the Historical process with the highest utilization and the Historical process
with the lowest utilization. The percent difference in utilization between the
two processes is com [...]
-### Compacting Segments
+### Automatic compaction
-Each run, the Druid Coordinator compacts segments by merging small segments or
splitting a large one. This is useful when your segments are not optimized
-in terms of segment size which may degrade query performance. See [Segment
Size Optimization](../operations/segment-optimization.md) for details.
+The Druid Coordinator manages the automatic compaction system.
+Each run, the Coordinator compacts segments by merging small segments or
splitting a large one. This is useful when the size of your segments is not
optimized which may degrade query performance.
+See [Segment size optimization](../operations/segment-optimization.md) for
details.
-The Coordinator first finds the segments to compact based on the [segment
search policy](#segment-search-policy).
+The Coordinator first finds the segments to compact based on the [segment
search policy](#segment-search-policy-in-automatic-compaction).
Once some segments are found, it issues a [compaction
task](../ingestion/tasks.md#compact) to compact those segments.
The maximum number of running compaction tasks is `min(sum of worker capacity
* slotRatio, maxSlots)`.
-Note that even though `min(sum of worker capacity * slotRatio, maxSlots)` = 0,
at least one compaction task is always submitted
+Note that even if `min(sum of worker capacity * slotRatio, maxSlots) = 0`, at
least one compaction task is always submitted
if the compaction is enabled for a dataSource.
-See [Compaction Configuration
API](../operations/api-reference.md#compaction-configuration) and [Compaction
Configuration](../configuration/index.md#compaction-dynamic-configuration) to
enable the compaction.
+See [Automatic compaction configuration
API](../operations/api-reference.md#automatic-compaction-configuration) and
[Automatic compaction
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
to enable and configure automatic compaction.
-Compaction tasks might fail due to the following reasons.
+Compaction tasks might fail due to the following reasons:
- If the input segments of a compaction task are removed or overshadowed
before it starts, that compaction task fails immediately.
- If a task of a higher priority acquires a [time chunk
lock](../ingestion/tasks.md#locking) for an interval overlapping with the
interval of a compaction task, the compaction task fails.
Once a compaction task fails, the Coordinator simply checks the segments in
the interval of the failed task again, and issues another compaction task in
the next run.
-Note that Compacting Segments Coordinator Duty is automatically enabled and
run as part of the Indexing Service Duties group. However, Compacting Segments
Coordinator Duty can be configured to run in isolation as a separate
coordinator duty group. This allows changing the period of Compacting Segments
Coordinator Duty without impacting the period of other Indexing Service Duties.
This can be done by setting the following properties (for more details see
[custom pluggable Coordinator Duty [...]
+Note that Compacting Segments Coordinator Duty is automatically enabled and
run as part of the Indexing Service Duties group. However, Compacting Segments
Coordinator Duty can be configured to run in isolation as a separate
Coordinator duty group. This allows changing the period of Compacting Segments
Coordinator Duty without impacting the period of other Indexing Service Duties.
This can be done by setting the following properties. For more details, see
[custom pluggable Coordinator Dut [...]
```
druid.coordinator.dutyGroups=[<SOME_GROUP_NAME>]
druid.coordinator.<SOME_GROUP_NAME>.duties=["compactSegments"]
druid.coordinator.<SOME_GROUP_NAME>.period=<PERIOD_TO_RUN_COMPACTING_SEGMENTS_DUTY>
```
-### Segment search policy
+### Segment search policy in automatic compaction
-#### Recent segment first policy
-
-At every coordinator run, this policy looks up time chunks in order of
newest-to-oldest and checks whether the segments in those time chunks
-need compaction or not.
-A set of segments need compaction if all conditions below are satisfied.
+At every Coordinator run, this policy looks up time chunks from newest to
oldest and checks whether the segments in those time chunks
+need compaction.
+A set of segments needs compaction if all conditions below are satisfied:
1) Total size of segments in the time chunk is smaller than or equal to the
configured `inputSegmentSizeBytes`.
2) Segments have never been compacted yet or compaction spec has been updated
since the last compaction, especially `maxRowsPerSegment`, `maxTotalRows`, and
`indexSpec`.
@@ -130,22 +129,22 @@ Assuming that each segment is 10 MB and haven't been
compacted yet, this policy
`foo_2017-11-01T00:00:00.000Z_2017-12-01T00:00:00.000Z_VERSION` and
`foo_2017-11-01T00:00:00.000Z_2017-12-01T00:00:00.000Z_VERSION_1` to compact
together because
`2017-11-01T00:00:00.000Z/2017-12-01T00:00:00.000Z` is the most recent time
chunk.
-If the coordinator has enough task slots for compaction, this policy will
continue searching for the next segments and return
+If the Coordinator has enough task slots for compaction, this policy will
continue searching for the next segments and return
`bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION` and
`bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION_1`.
Finally, `foo_2017-09-01T00:00:00.000Z_2017-10-01T00:00:00.000Z_VERSION` will
be picked up even though there is only one segment in the time chunk of
`2017-09-01T00:00:00.000Z/2017-10-01T00:00:00.000Z`.
-The search start point can be changed by setting
[skipOffsetFromLatest](../configuration/index.md#compaction-dynamic-configuration).
+The search start point can be changed by setting
[`skipOffsetFromLatest`](../configuration/index.md#automatic-compaction-dynamic-configuration).
If this is set, this policy will ignore the segments falling into the time
chunk of (the end time of the most recent segment - `skipOffsetFromLatest`).
This is to avoid conflicts between compaction tasks and realtime tasks.
Note that realtime tasks have a higher priority than compaction tasks by
default. Realtime tasks will revoke the locks of compaction tasks if their
intervals overlap, resulting in the termination of the compaction task.
> This policy currently cannot handle the situation when there are a lot of
> small segments which have the same interval,
-> and their total size exceeds
[inputSegmentSizeBytes](../configuration/index.md#compaction-dynamic-configuration).
+> and their total size exceeds
[`inputSegmentSizeBytes`](../configuration/index.md#automatic-compaction-dynamic-configuration).
> If it finds such segments, it simply skips them.
### The Coordinator console
-The Druid Coordinator exposes a web GUI for displaying cluster information and
rule configuration. For more details, please see [coordinator
console](../operations/management-uis.md#coordinator-consoles).
+The Druid Coordinator exposes a web GUI for displaying cluster information and
rule configuration. For more details, see [Coordinator
console](../operations/management-uis.md#coordinator-consoles).
### FAQ
diff --git a/docs/ingestion/compaction.md b/docs/ingestion/compaction.md
index 379e1d3497..16ef17a2d3 100644
--- a/docs/ingestion/compaction.md
+++ b/docs/ingestion/compaction.md
@@ -28,7 +28,7 @@ Query performance in Apache Druid depends on optimally sized
segments. Compactio
There are several cases to consider compaction for segment optimization:
-- With streaming ingestion, data can arrive out of chronological order
creating lots of small segments.
+- With streaming ingestion, data can arrive out of chronological order
creating many small segments.
- If you append data using `appendToExisting` for [native
batch](native-batch.md) ingestion creating suboptimal segments.
- When you use `index_parallel` for parallel batch indexing and the parallel
ingestion tasks create many small segments.
- When a misconfigured ingestion task creates oversized segments.
@@ -36,7 +36,7 @@ There are several cases to consider compaction for segment
optimization:
By default, compaction does not modify the underlying data of the segments.
However, there are cases when you may want to modify data during compaction to
improve query performance:
- If, after ingestion, you realize that data for the time interval is sparse,
you can use compaction to increase the segment granularity.
-- Over time you don't need fine-grained granularity for older data so you want
use compaction to change older segments to a coarser query granularity. This
reduces the storage space required for older data. For example from `minute` to
`hour`, or `hour` to `day`.
+- If you don't need fine-grained granularity for older data, you can use
compaction to change older segments to a coarser query granularity. For
example, from `minute` to `hour` or `hour` to `day`. This reduces the storage
space required for older data.
- You can change the dimension order to improve sorting and reduce segment
size.
- You can remove unused columns in compaction or implement an aggregation
metric for older data.
- You can change segment rollup from dynamic partitioning with best-effort
rollup to hash or range partitioning with perfect rollup. For more information
on rollup, see [perfect vs best-effort
rollup](./rollup.md#perfect-rollup-vs-best-effort-rollup).
@@ -44,9 +44,10 @@ By default, compaction does not modify the underlying data
of the segments. Howe
Compaction does not improve performance in all situations. For example, if you
rewrite your data with each ingestion task, you don't need to use compaction.
See [Segment optimization](../operations/segment-optimization.md) for
additional guidance to determine if compaction will help in your environment.
## Types of compaction
-You can configure the Druid Coordinator to perform automatic compaction, also
called auto-compaction, for a datasource. Using a segment search policy, the
coordinator periodically identifies segments for compaction starting with the
newest to oldest. When it discovers segments that have not been compacted or
segments that were compacted with a different or changed spec, it submits
compaction task for those segments and only those segments.
-Automatic compaction works in most use cases and should be your first option.
To learn more about automatic compaction, see [Compacting
Segments](../design/coordinator.md#compacting-segments).
+You can configure the Druid Coordinator to perform automatic compaction, also
called auto-compaction, for a datasource. Using its [segment search
policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction),
the Coordinator periodically identifies segments for compaction starting from
newest to oldest. When the Coordinator discovers segments that have not been
compacted or segments that were compacted with a different or changed spec, it
submits compaction tasks for th [...]
+
+Automatic compaction works in most use cases and should be your first option.
To learn more about automatic compaction, see [Compacting
Segments](../design/coordinator.md#automatic-compaction).
In cases where you require more control over compaction, you can manually
submit compaction tasks. For example:
@@ -62,7 +63,7 @@ During compaction, Druid overwrites the original set of
segments with the compac
You can set `dropExisting` in `ioConfig` to "true" in the compaction task to
configure Druid to replace all existing segments fully contained by the
interval. See the suggestion for reindexing with finer granularity under
[Implementation considerations](native-batch.md#implementation-considerations)
for an example.
> WARNING: `dropExisting` in `ioConfig` is a beta feature.
-If an ingestion task needs to write data to a segment for a time interval
locked for compaction, by default the ingestion task supersedes the compaction
task and the compaction task fails without finishing. For manual compaction
tasks you can adjust the input spec interval to avoid conflicts between
ingestion and compaction. For automatic compaction, you can set the
`skipOffsetFromLatest` key to adjust the auto compaction starting point from
the current time to reduce the chance of confl [...]
+If an ingestion task needs to write data to a segment for a time interval
locked for compaction, by default the ingestion task supersedes the compaction
task and the compaction task fails without finishing. For manual compaction
tasks, you can adjust the input spec interval to avoid conflicts between
ingestion and compaction. For automatic compaction, you can set the
`skipOffsetFromLatest` key to adjust the auto-compaction starting point from
the current time to reduce the chance of conf [...]
### Segment granularity handling
@@ -82,13 +83,14 @@ If you configure query granularity in compaction to go from
a finer granularity
### Dimension handling
-Apache Druid supports schema changes. Therefore, dimensions can be different
across segments even if they are a part of the same data source. See [Different
schemas among
segments](../design/segments.md#different-schemas-among-segments). If the input
segments have different dimensions, the resulting compacted segment include all
dimensions of the input segments.
+Apache Druid supports schema changes. Therefore, dimensions can be different
across segments even if they are a part of the same data source. See [Different
schemas among
segments](../design/segments.md#different-schemas-among-segments). If the input
segments have different dimensions, the resulting compacted segment includes
all dimensions of the input segments.
Even when the input segments have the same set of dimensions, the dimension
order or the data type of dimensions can be different. The dimensions of recent
segments precede that of old segments in terms of data types and the ordering
because more recent segments are more likely to have the preferred order and
data types.
If you want to control dimension ordering or ensure specific values for
dimension types, you can configure a custom `dimensionsSpec` in the compaction
task spec.
### Rollup
+
Druid only rolls up the output segment when `rollup` is set for all input
segments.
See [Roll-up](../ingestion/rollup.md) for more details.
You can check that your segments are rolled up or not by using [Segment
Metadata Queries](../querying/segmentmetadataquery.md#analysistypes).
@@ -104,6 +106,7 @@ To perform a manual compaction, you submit a compaction
task. Compaction tasks m
"dataSource": <task_datasource>,
"ioConfig": <IO config>,
"dimensionsSpec": <custom dimensionsSpec>,
+ "transformSpec": <custom transformSpec>,
"metricsSpec": <custom metricsSpec>,
"tuningConfig": <parallel indexing task tuningConfig>,
"granularitySpec": <compaction task granularitySpec>,
@@ -120,14 +123,14 @@ To perform a manual compaction, you submit a compaction
task. Compaction tasks m
|`dimensionsSpec`|Custom `dimensionsSpec`. The compaction task uses the
specified `dimensionsSpec` if it exists instead of generating one. See
[Compaction dimensionsSpec](#compaction-dimensions-spec) for details.|No|
|`transformSpec`|Custom `transformSpec`. The compaction task uses the
specified `transformSpec` rather than using `null`. See [Compaction
transformSpec](#compaction-transform-spec) for details.|No|
|`metricsSpec`|Custom `metricsSpec`. The compaction task uses the specified
`metricsSpec` rather than generating one.|No|
-|`segmentGranularity`|When set, the compaction task changes the segment
granularity for the given interval. Deprecated. Use `granularitySpec`. |No.|
-|`tuningConfig`|[Parallel indexing task
tuningConfig](native-batch.md#tuningconfig).
`awaitSegmentAvailabilityTimeoutMillis` in the tuning config is not currently
supported for compaction tasks. Do not set it to a non-zero value.|No|
-|`context`|[Task context](./tasks.md#context)|No|
+|`segmentGranularity`|When set, the compaction task changes the segment
granularity for the given interval. Deprecated. Use `granularitySpec`. |No|
+|`tuningConfig`|[Parallel indexing task
tuningConfig](native-batch.md#tuningconfig).
`awaitSegmentAvailabilityTimeoutMillis` in the tuning config is not supported
for compaction tasks. Leave this parameter at the default value, 0.|No|
|`granularitySpec`|Custom `granularitySpec`. The compaction task uses the
specified `granularitySpec` rather than generating one. See [Compaction
`granularitySpec`](#compaction-granularity-spec) for details.|No|
+|`context`|[Task context](./tasks.md#context).|No|
> Note: Use `granularitySpec` over `segmentGranularity` and only set one of
> these values. If you specify different values for these in the same
> compaction spec, the task fails.
-To control the number of result segments per time chunk, you can set
[`maxRowsPerSegment`](../configuration/index.md#compaction-dynamic-configuration)
or [`numShards`](../ingestion/native-batch.md#tuningconfig).
+To control the number of result segments per time chunk, you can set
[`maxRowsPerSegment`](../configuration/index.md#automatic-compaction-dynamic-configuration)
or [`numShards`](../ingestion/native-batch.md#tuningconfig).
> You can run multiple compaction tasks in parallel. For example, if you want
> to compact the data for a year, you are not limited to running a single task
> for the entire year. You can run 12 compaction tasks with month-long
> intervals.
@@ -174,7 +177,7 @@ The compaction `ioConfig` requires specifying `inputSpec`
as follows:
|-----|-----------|-------|--------|
|`type`|Task type: `compact`|none|Yes|
|`inputSpec`|Specification of the target [intervals](#interval-inputspec) or
[segments](#segments-inputspec).|none|Yes|
-|`dropExisting`|If `true` the task replaces all existing segments fully
contained by either of the following:<br>- the `interval` in the `interval`
type `inputSpec`.<br>- the umbrella interval of the `segments` in the `segment`
type `inputSpec`.<br>If compaction fails, Druid does change any of the existing
segments.<br>**WARNING**: `dropExisting` in `ioConfig` is a beta feature.
|false|no|
+|`dropExisting`|If `true`, the task replaces all existing segments fully
contained by either of the following:<br>- the `interval` in the `interval`
type `inputSpec`.<br>- the umbrella interval of the `segments` in the `segment`
type `inputSpec`.<br>If compaction fails, Druid does not change any of the
existing segments.<br>**WARNING**: `dropExisting` in `ioConfig` is a beta
feature. |false|No|
Druid supports two supported `inputSpec` formats:
@@ -214,31 +217,10 @@ Druid supports two supported `inputSpec` formats:
|`queryGranularity`|The resolution of timestamp storage within each segment.
Defaults to 'null', which preserves the original query granularity. Accepts all
[Query granularity](../querying/granularities.md) values.|No|
|`rollup`|Whether to enable ingestion-time rollup or not. Defaults to 'null',
which preserves the original setting. Note that once data is rollup, individual
records can no longer be recovered. |No|
-For example, to set the segment granularity to "day", the query granularity to
"hour", and enabling rollup:
-
-```json
-{
- "type": "compact",
- "dataSource": "wikipedia",
- "ioConfig": {
- "type": "compact",
- "inputSpec": {
- "type": "interval",
- "interval": "2017-01-01/2018-01-01"
- },
- "granularitySpec": {
- "segmentGranularity": "day",
- "queryGranularity": "hour",
- "rollup": true
- }
- }
-}
-```
-
## Learn more
See the following topics for more information:
- [Segment optimization](../operations/segment-optimization.md) for guidance
to determine if compaction will help in your case.
-- [Compacting Segments](../design/coordinator.md#compacting-segments) for more
on automatic compaction.
-- [Compaction Configuration
API](../operations/api-reference.md#compaction-configuration)
-and [Compaction
Configuration](../configuration/index.md#compaction-dynamic-configuration) for
automatic compaction configuration information.
+- [Compacting Segments](../design/coordinator.md#automatic-compaction) for
details on how the Coordinator manages automatic compaction.
+- [Automatic compaction configuration
API](../operations/api-reference.md#automatic-compaction-configuration)
+and [Automatic compaction
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
for automatic compaction configuration information.
diff --git a/docs/ingestion/tasks.md b/docs/ingestion/tasks.md
index 54b7661b01..4acece8ce5 100644
--- a/docs/ingestion/tasks.md
+++ b/docs/ingestion/tasks.md
@@ -356,6 +356,9 @@ You can override the task priority by setting your priority
in the task context
The task context is used for various individual task configuration.
Specify task context configurations in the `context` field of the ingestion
spec.
+When configuring [automatic
compaction](../configuration/index.md#automatic-compaction-dynamic-configuration),
set the task context configurations in `taskContext` rather than in `context`.
+The settings get passed into the `context` field of the compaction tasks
issued to MiddleManagers.
+
The following parameters apply to all task types.
|property|default|description|
diff --git a/docs/operations/api-reference.md b/docs/operations/api-reference.md
index 7bdb4b4f22..e114b365ad 100644
--- a/docs/operations/api-reference.md
+++ b/docs/operations/api-reference.md
@@ -458,52 +458,52 @@ to filter by interval and limit the number of results
respectively.
Update overlord dynamic worker configuration.
-#### Compaction Status
+#### Automatic compaction status
##### GET
* `/druid/coordinator/v1/compaction/progress?dataSource={dataSource}`
Returns the total size of segments awaiting compaction for the given
dataSource.
-This is only valid for dataSource which has compaction enabled.
+The specified dataSource must have automatic compaction enabled.
##### GET
* `/druid/coordinator/v1/compaction/status`
-Returns the status and statistics from the auto compaction run of all
dataSources which have auto compaction enabled in the latest run.
-The response payload includes a list of `latestStatus` objects. Each
`latestStatus` represents the status for a dataSource (which has/had auto
compaction enabled).
+Returns the status and statistics from the auto-compaction run of all
dataSources which have auto-compaction enabled in the latest run.
+The response payload includes a list of `latestStatus` objects. Each
`latestStatus` represents the status for a dataSource (which has/had
auto-compaction enabled).
The `latestStatus` object has the following keys:
* `dataSource`: name of the datasource for this status information
-* `scheduleStatus`: auto compaction scheduling status. Possible values are
`NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the dataSource has an active
auto compaction config submitted otherwise, `NOT_ENABLED`
-* `bytesAwaitingCompaction`: total bytes of this datasource waiting to be
compacted by the auto compaction (only consider intervals/segments that are
eligible for auto compaction)
-* `bytesCompacted`: total bytes of this datasource that are already compacted
with the spec set in the auto compaction config.
-* `bytesSkipped`: total bytes of this datasource that are skipped (not
eligible for auto compaction) by the auto compaction.
-* `segmentCountAwaitingCompaction`: total number of segments of this
datasource waiting to be compacted by the auto compaction (only consider
intervals/segments that are eligible for auto compaction)
-* `segmentCountCompacted`: total number of segments of this datasource that
are already compacted with the spec set in the auto compaction config.
-* `segmentCountSkipped`: total number of segments of this datasource that are
skipped (not eligible for auto compaction) by the auto compaction.
-* `intervalCountAwaitingCompaction`: total number of intervals of this
datasource waiting to be compacted by the auto compaction (only consider
intervals/segments that are eligible for auto compaction)
-* `intervalCountCompacted`: total number of intervals of this datasource that
are already compacted with the spec set in the auto compaction config.
-* `intervalCountSkipped`: total number of intervals of this datasource that
are skipped (not eligible for auto compaction) by the auto compaction.
+* `scheduleStatus`: auto-compaction scheduling status. Possible values are
`NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the dataSource has an active
auto-compaction config submitted. Otherwise, returns `NOT_ENABLED`.
+* `bytesAwaitingCompaction`: total bytes of this datasource waiting to be
compacted by the auto-compaction (only consider intervals/segments that are
eligible for auto-compaction)
+* `bytesCompacted`: total bytes of this datasource that are already compacted
with the spec set in the auto-compaction config
+* `bytesSkipped`: total bytes of this datasource that are skipped (not
eligible for auto-compaction) by the auto-compaction
+* `segmentCountAwaitingCompaction`: total number of segments of this
datasource waiting to be compacted by the auto-compaction (only consider
intervals/segments that are eligible for auto-compaction)
+* `segmentCountCompacted`: total number of segments of this datasource that
are already compacted with the spec set in the auto-compaction config
+* `segmentCountSkipped`: total number of segments of this datasource that are
skipped (not eligible for auto-compaction) by the auto-compaction
+* `intervalCountAwaitingCompaction`: total number of intervals of this
datasource waiting to be compacted by the auto-compaction (only consider
intervals/segments that are eligible for auto-compaction)
+* `intervalCountCompacted`: total number of intervals of this datasource that
are already compacted with the spec set in the auto-compaction config
+* `intervalCountSkipped`: total number of intervals of this datasource that
are skipped (not eligible for auto-compaction) by the auto-compaction
##### GET
* `/druid/coordinator/v1/compaction/status?dataSource={dataSource}`
Similar to the API `/druid/coordinator/v1/compaction/status` above but filters
response to only return information for the {dataSource} given.
-Note that {dataSource} given must have/had auto compaction enabled.
+Note that {dataSource} given must have/had auto-compaction enabled.
-#### Compaction Configuration
+#### Automatic compaction configuration
##### GET
* `/druid/coordinator/v1/config/compaction`
-Returns all compaction configs.
+Returns all automatic compaction configs.
* `/druid/coordinator/v1/config/compaction/{dataSource}`
-Returns a compaction config of a dataSource.
+Returns an automatic compaction config of a dataSource.
##### POST
@@ -517,15 +517,15 @@ will be set for them.
* `/druid/coordinator/v1/config/compaction`
-Creates or updates the compaction config for a dataSource.
-See [Compaction
Configuration](../configuration/index.md#compaction-dynamic-configuration) for
configuration details.
+Creates or updates the automatic compaction config for a dataSource.
+See [Automatic compaction dynamic
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
for configuration details.
##### DELETE
* `/druid/coordinator/v1/config/compaction/{dataSource}`
-Removes the compaction config for a dataSource.
+Removes the automatic compaction config for a dataSource.
#### Server information
diff --git a/docs/operations/segment-optimization.md
b/docs/operations/segment-optimization.md
index f79af5ea8f..93229c40b4 100644
--- a/docs/operations/segment-optimization.md
+++ b/docs/operations/segment-optimization.md
@@ -1,6 +1,6 @@
---
id: segment-optimization
-title: "Segment Size Optimization"
+title: "Segment size optimization"
---
<!--
@@ -87,11 +87,11 @@ In this case, you may want to see only rows of the max
version per interval (pai
Once you find your segments need compaction, you can consider the below two
options:
- - Turning on the [automatic compaction of
Coordinators](../design/coordinator.md#compacting-segments).
+ - Turning on the [automatic compaction of
Coordinators](../design/coordinator.md#automatic-compaction).
The Coordinator periodically submits [compaction
tasks](../ingestion/tasks.md#compact) to re-index small segments.
To enable the automatic compaction, you need to configure it for each
dataSource via Coordinator's dynamic configuration.
- See [Compaction Configuration
API](../operations/api-reference.md#compaction-configuration)
- and [Compaction
Configuration](../configuration/index.md#compaction-dynamic-configuration) for
details.
+ See [Automatic compaction configuration
API](../operations/api-reference.md#automatic-compaction-configuration)
+ and [Automatic compaction dynamic
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
for details.
- Running periodic Hadoop batch ingestion jobs and using a `dataSource`
inputSpec to read from the segments generated by the Kafka indexing tasks.
This might be helpful if you want to compact a lot of segments in parallel.
Details on how to do this can be found on the [Updating existing
data](../ingestion/data-management.md#update) section
diff --git a/website/.spelling b/website/.spelling
index 89ad85dd32..d23986e9a0 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -55,6 +55,7 @@ DRUIDVERSION
DataSketches
DateTime
DateType
+dimensionsSpec
DimensionSpec
DimensionSpecs
Dockerfile
@@ -112,6 +113,7 @@ InputFormat
InputSource
InputSources
Integer.MAX_VALUE
+ioConfig
JBOD
JDBC
JDK
@@ -671,7 +673,6 @@ baseDataSource
baseDataSource-hashCode
classpathPrefix
derivativeDataSource
-dimensionsSpec
druid.extensions.hadoopDependenciesDir
hadoopDependencyCoordinates
maxTaskCount
@@ -1132,7 +1133,6 @@ datetime
f.example.com
filePattern
forceExtendableShardSpecs
-granularitySpec
ignoreInvalidRows
ignoreWhenNoSegments
indexSpecForIntermediatePersists
@@ -1842,7 +1842,6 @@ cpuacct
dataSourceName
datetime
defaultHistory
-dimensionsSpec
doubleMax
doubleMin
doubleSum
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]