[druid] branch master updated: Update automatic compaction docs with consistent terminology (#12416)

techdocsmith Tue, 03 May 2022 16:22:43 -0700

This is an automated email from the ASF dual-hosted git repository.

techdocsmith pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git



The following commit(s) were added to refs/heads/master by this push:
     new 0206a2da5c Update automatic compaction docs with consistent 
terminology (#12416)
0206a2da5c is described below

commit 0206a2da5c6211caf451751876d847f40c2e7755
Author: Victoria Lim <[email protected]>
AuthorDate: Tue May 3 16:22:25 2022 -0700

    Update automatic compaction docs with consistent terminology (#12416)
    
    * specify automatic compaction where applicable
    
    * Apply suggestions from code review
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * update for style and consistency
    
    * implement suggested feedback
    
    * remove duplicate example
    
    * Apply suggestions from code review
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/ingestion/compaction.md
    
    Co-authored-by: Katya Macedo  <[email protected]>
    
    * Update docs/operations/api-reference.md
    
    * update .spelling
    
    * Adopt review suggestions
    
    Co-authored-by: Katya Macedo  <[email protected]>
---
 docs/configuration/index.md             | 40 ++++++++++++-------------
 docs/design/coordinator.md              | 35 +++++++++++-----------
 docs/ingestion/compaction.md            | 52 +++++++++++----------------------
 docs/ingestion/tasks.md                 |  3 ++
 docs/operations/api-reference.md        | 42 +++++++++++++-------------
 docs/operations/segment-optimization.md |  8 ++---
 website/.spelling                       |  5 ++--
 7 files changed, 84 insertions(+), 101 deletions(-)

diff --git a/docs/configuration/index.md b/docs/configuration/index.md
index e598b36ea0..1e829bce74 100644
--- a/docs/configuration/index.md
+++ b/docs/configuration/index.md
@@ -951,14 +951,14 @@ These configuration options control the behavior of the 
Lookup dynamic configura
 |`druid.manager.lookups.threadPoolSize`|How many processes can be managed 
concurrently (concurrent POST and DELETE requests). Requests this limit will 
wait in a queue until a slot becomes available.|10|
 |`druid.manager.lookups.period`|How many milliseconds between checks for 
configuration changes|30_000|
 
-##### Compaction Dynamic Configuration
+##### Automatic compaction dynamic configuration
 
-Compaction configurations can also be set or updated dynamically using
-[Coordinator's API](../operations/api-reference.md#compaction-configuration) 
without restarting Coordinators.
+You can set or update automatic compaction properties dynamically using the
+[Coordinator 
API](../operations/api-reference.md#automatic-compaction-configuration) without 
restarting Coordinators.
 
-For details about segment compaction, please check [Segment Size 
Optimization](../operations/segment-optimization.md).
+For details about segment compaction, see [Segment size 
optimization](../operations/segment-optimization.md).
 
-A description of the compaction config is:
+You can configure automatic compaction through the following properties:
 
 |Property|Description|Required|
 |--------|-----------|--------|
@@ -966,16 +966,16 @@ A description of the compaction config is:
 |`taskPriority`|[Priority](../ingestion/tasks.md#priority) of compaction 
task.|no (default = 25)|
 |`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per 
compaction task. Since a time chunk must be processed in its entirety, if the 
segments for a particular time chunk have a total size in bytes greater than 
this parameter, compaction will not run for that time chunk. Because each 
compaction task runs with a single thread, setting this value too far above 
1–2GB will result in compaction tasks taking an excessive amount of time.|no 
(default = Long.MAX_VALUE)|
 |`maxRowsPerSegment`|Max number of rows per segment after compaction.|no|
-|`skipOffsetFromLatest`|The offset for searching segments to be compacted in 
[ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly 
recommended to set for realtime dataSources. See [Data handling with 
compaction](../ingestion/compaction.md#data-handling-with-compaction)|no 
(default = "P1D")|
-|`tuningConfig`|Tuning config for compaction tasks. See below [Compaction Task 
TuningConfig](#automatic-compaction-tuningconfig).|no|
+|`skipOffsetFromLatest`|The offset for searching segments to be compacted in 
[ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly 
recommended to set for realtime dataSources. See [Data handling with 
compaction](../ingestion/compaction.md#data-handling-with-compaction).|no 
(default = "P1D")|
+|`tuningConfig`|Tuning config for compaction tasks. See below [Automatic 
compaction tuningConfig](#automatic-compaction-tuningconfig).|no|
 |`taskContext`|[Task context](../ingestion/tasks.md#context) for compaction 
tasks.|no|
-|`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction 
granularitySpec](#automatic-compaction-granularityspec)|No|
-|`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction 
dimensionsSpec](#automatic-compaction-dimensions-spec)|No|
-|`transformSpec`|Custom `transformSpec`. See [Automatic compaction 
transformSpec](#automatic-compaction-transform-spec)|No|
+|`granularitySpec`|Custom `granularitySpec`. See [Automatic compaction 
granularitySpec](#automatic-compaction-granularityspec).|No|
+|`dimensionsSpec`|Custom `dimensionsSpec`. See [Automatic compaction 
dimensionsSpec](#automatic-compaction-dimensionsspec).|No|
+|`transformSpec`|Custom `transformSpec`. See [Automatic compaction 
transformSpec](#automatic-compaction-transformspec).|No|
 |`metricsSpec`|Custom 
[`metricsSpec`](../ingestion/ingestion-spec.md#metricsspec). The compaction 
task preserves any existing metrics regardless of whether `metricsSpec` is 
specified. If `metricsSpec` is specified, Druid does not reapply any 
aggregators matching the metric names specified in `metricsSpec` to rows that 
already have the associated metrics. For rows that do not already have the 
metric specified in `metricsSpec`, Druid applies the metric aggregator on the 
source column, then [...]
-|`ioConfig`|IO config for compaction tasks. See below [Compaction Task 
IOConfig](#automatic-compaction-ioconfig).|no|
+|`ioConfig`|IO config for compaction tasks. See [Automatic compaction 
ioConfig](#automatic-compaction-ioconfig).|no|
 
-An example of compaction config is:
+Automatic compaction config example:
 
 ```json
 {
@@ -989,10 +989,10 @@ An example of compaction config is:
 Compaction tasks fail when higher priority tasks cause Druid to revoke their 
locks. By default, realtime tasks like ingestion have a higher priority than 
compaction tasks. Therefore frequent conflicts between compaction tasks and 
realtime tasks can cause the coordinator's automatic compaction to get stuck.
 You may see this issue with streaming ingestion from Kafka and Kinesis, which 
ingest late-arriving data. To mitigate this problem, set `skipOffsetFromLatest` 
to a value large enough so that arriving data tends to fall outside the offset 
value from the current time. This way you can avoid conflicts between 
compaction tasks and realtime ingestion tasks.
 
-###### Automatic compaction TuningConfig
+###### Automatic compaction tuningConfig
 
-Auto compaction supports a subset of the [tuningConfig for Parallel 
task](../ingestion/native-batch.md#tuningconfig).
-The below is a list of the supported configurations for auto compaction.
+Auto-compaction supports a subset of the [tuningConfig for Parallel 
task](../ingestion/native-batch.md#tuningconfig).
+The below is a list of the supported configurations for auto-compaction.
 
 |Property|Description|Required|
 |--------|-----------|--------|
@@ -1022,22 +1022,22 @@ The below is a list of the supported configurations for 
auto compaction.
 |`queryGranularity`|The resolution of timestamp storage within each segment. 
Defaults to 'null', which preserves the original query granularity. Accepts all 
[Query granularity](../querying/granularities.md) values.|No|
 |`rollup`|Whether to enable ingestion-time rollup or not. Defaults to 'null', 
which preserves the original setting. Note that once data is rollup, individual 
records can no longer be recovered. |No|
 
-###### Automatic compaction dimensions spec
+###### Automatic compaction dimensionsSpec
 
 |Field|Description|Required|
 |-----|-----------|--------|
 |`dimensions`| A list of dimension names or objects. Defaults to 'null', which 
preserves the original dimensions. Note that setting this will cause segments 
manually compacted with `dimensionExclusions` to be compacted again.|No|
 
-###### Automatic compaction transform spec
+###### Automatic compaction transformSpec
 
 |Field|Description|Required|
 |-----|-----------|--------|
 |`filter`| The `filter` conditionally filters input rows during compaction. 
Only rows that pass the filter will be included in the compacted segments. Any 
of Druid's standard [query filters](../querying/filters.md) can be used. 
Defaults to 'null', which will not filter any row. |No|
 
-###### Automatic compaction IOConfig
+###### Automatic compaction ioConfig
 
-Auto compaction supports a subset of the [IOConfig for Parallel 
task](../ingestion/native-batch.md).
-The below is a list of the supported configurations for auto compaction.
+Auto-compaction supports a subset of the [ioConfig for Parallel 
task](../ingestion/native-batch.md).
+The below is a list of the supported configurations for auto-compaction.
 
 |Property|Description|Default|Required|
 |--------|-----------|-------|--------|
diff --git a/docs/design/coordinator.md b/docs/design/coordinator.md
index a3d33eca9f..44f7297bb5 100644
--- a/docs/design/coordinator.md
+++ b/docs/design/coordinator.md
@@ -79,39 +79,38 @@ If a Historical process restarts or becomes unavailable for 
any reason, the Drui
 
 To ensure an even distribution of segments across Historical processes in the 
cluster, the Coordinator process will find the total size of all segments being 
served by every Historical process each time the Coordinator runs. For every 
Historical process tier in the cluster, the Coordinator process will determine 
the Historical process with the highest utilization and the Historical process 
with the lowest utilization. The percent difference in utilization between the 
two processes is com [...]
 
-### Compacting Segments
+### Automatic compaction
 
-Each run, the Druid Coordinator compacts segments by merging small segments or 
splitting a large one. This is useful when your segments are not optimized
-in terms of segment size which may degrade query performance. See [Segment 
Size Optimization](../operations/segment-optimization.md) for details.
+The Druid Coordinator manages the automatic compaction system.
+Each run, the Coordinator compacts segments by merging small segments or 
splitting a large one. This is useful when the size of your segments is not 
optimized which may degrade query performance.
+See [Segment size optimization](../operations/segment-optimization.md) for 
details.
 
-The Coordinator first finds the segments to compact based on the [segment 
search policy](#segment-search-policy).
+The Coordinator first finds the segments to compact based on the [segment 
search policy](#segment-search-policy-in-automatic-compaction).
 Once some segments are found, it issues a [compaction 
task](../ingestion/tasks.md#compact) to compact those segments.
 The maximum number of running compaction tasks is `min(sum of worker capacity 
* slotRatio, maxSlots)`.
-Note that even though `min(sum of worker capacity * slotRatio, maxSlots)` = 0, 
at least one compaction task is always submitted
+Note that even if `min(sum of worker capacity * slotRatio, maxSlots) = 0`, at 
least one compaction task is always submitted
 if the compaction is enabled for a dataSource.
-See [Compaction Configuration 
API](../operations/api-reference.md#compaction-configuration) and [Compaction 
Configuration](../configuration/index.md#compaction-dynamic-configuration) to 
enable the compaction.
+See [Automatic compaction configuration 
API](../operations/api-reference.md#automatic-compaction-configuration) and 
[Automatic compaction 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
 to enable and configure automatic compaction.
 
-Compaction tasks might fail due to the following reasons.
+Compaction tasks might fail due to the following reasons:
 
 - If the input segments of a compaction task are removed or overshadowed 
before it starts, that compaction task fails immediately.
 - If a task of a higher priority acquires a [time chunk 
lock](../ingestion/tasks.md#locking) for an interval overlapping with the 
interval of a compaction task, the compaction task fails.
 
 Once a compaction task fails, the Coordinator simply checks the segments in 
the interval of the failed task again, and issues another compaction task in 
the next run.
 
-Note that Compacting Segments Coordinator Duty is automatically enabled and 
run as part of the Indexing Service Duties group. However, Compacting Segments 
Coordinator Duty can be configured to run in isolation as a separate 
coordinator duty group. This allows changing the period of Compacting Segments 
Coordinator Duty without impacting the period of other Indexing Service Duties. 
This can be done by setting the following properties (for more details see 
[custom pluggable Coordinator Duty [...]
+Note that Compacting Segments Coordinator Duty is automatically enabled and 
run as part of the Indexing Service Duties group. However, Compacting Segments 
Coordinator Duty can be configured to run in isolation as a separate 
Coordinator duty group. This allows changing the period of Compacting Segments 
Coordinator Duty without impacting the period of other Indexing Service Duties. 
This can be done by setting the following properties. For more details, see 
[custom pluggable Coordinator Dut [...]
 ```
 druid.coordinator.dutyGroups=[<SOME_GROUP_NAME>]
 druid.coordinator.<SOME_GROUP_NAME>.duties=["compactSegments"]
 
druid.coordinator.<SOME_GROUP_NAME>.period=<PERIOD_TO_RUN_COMPACTING_SEGMENTS_DUTY>
 ```
 
-### Segment search policy
+### Segment search policy in automatic compaction
 
-#### Recent segment first policy
-
-At every coordinator run, this policy looks up time chunks in order of 
newest-to-oldest and checks whether the segments in those time chunks
-need compaction or not.
-A set of segments need compaction if all conditions below are satisfied.
+At every Coordinator run, this policy looks up time chunks from newest to 
oldest and checks whether the segments in those time chunks
+need compaction.
+A set of segments needs compaction if all conditions below are satisfied:
 
 1) Total size of segments in the time chunk is smaller than or equal to the 
configured `inputSegmentSizeBytes`.
 2) Segments have never been compacted yet or compaction spec has been updated 
since the last compaction, especially `maxRowsPerSegment`, `maxTotalRows`, and 
`indexSpec`.
@@ -130,22 +129,22 @@ Assuming that each segment is 10 MB and haven't been 
compacted yet, this policy
 `foo_2017-11-01T00:00:00.000Z_2017-12-01T00:00:00.000Z_VERSION` and 
`foo_2017-11-01T00:00:00.000Z_2017-12-01T00:00:00.000Z_VERSION_1` to compact 
together because
 `2017-11-01T00:00:00.000Z/2017-12-01T00:00:00.000Z` is the most recent time 
chunk.
 
-If the coordinator has enough task slots for compaction, this policy will 
continue searching for the next segments and return
+If the Coordinator has enough task slots for compaction, this policy will 
continue searching for the next segments and return
 `bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION` and 
`bar_2017-10-01T00:00:00.000Z_2017-11-01T00:00:00.000Z_VERSION_1`.
 Finally, `foo_2017-09-01T00:00:00.000Z_2017-10-01T00:00:00.000Z_VERSION` will 
be picked up even though there is only one segment in the time chunk of 
`2017-09-01T00:00:00.000Z/2017-10-01T00:00:00.000Z`.
 
-The search start point can be changed by setting 
[skipOffsetFromLatest](../configuration/index.md#compaction-dynamic-configuration).
+The search start point can be changed by setting 
[`skipOffsetFromLatest`](../configuration/index.md#automatic-compaction-dynamic-configuration).
 If this is set, this policy will ignore the segments falling into the time 
chunk of (the end time of the most recent segment - `skipOffsetFromLatest`).
 This is to avoid conflicts between compaction tasks and realtime tasks.
 Note that realtime tasks have a higher priority than compaction tasks by 
default. Realtime tasks will revoke the locks of compaction tasks if their 
intervals overlap, resulting in the termination of the compaction task.
 
 > This policy currently cannot handle the situation when there are a lot of 
 > small segments which have the same interval,
-> and their total size exceeds 
[inputSegmentSizeBytes](../configuration/index.md#compaction-dynamic-configuration).
+> and their total size exceeds 
[`inputSegmentSizeBytes`](../configuration/index.md#automatic-compaction-dynamic-configuration).
 > If it finds such segments, it simply skips them.
 
 ### The Coordinator console
 
-The Druid Coordinator exposes a web GUI for displaying cluster information and 
rule configuration. For more details, please see [coordinator 
console](../operations/management-uis.md#coordinator-consoles).
+The Druid Coordinator exposes a web GUI for displaying cluster information and 
rule configuration. For more details, see [Coordinator 
console](../operations/management-uis.md#coordinator-consoles).
 
 ### FAQ
 
diff --git a/docs/ingestion/compaction.md b/docs/ingestion/compaction.md
index 379e1d3497..16ef17a2d3 100644
--- a/docs/ingestion/compaction.md
+++ b/docs/ingestion/compaction.md
@@ -28,7 +28,7 @@ Query performance in Apache Druid depends on optimally sized 
segments. Compactio
 
 There are several cases to consider compaction for segment optimization:
 
-- With streaming ingestion, data can arrive out of chronological order 
creating lots of small segments.
+- With streaming ingestion, data can arrive out of chronological order 
creating many small segments.
 - If you append data using `appendToExisting` for [native 
batch](native-batch.md) ingestion creating suboptimal segments.
 - When you use `index_parallel` for parallel batch indexing and the parallel 
ingestion tasks create many small segments.
 - When a misconfigured ingestion task creates oversized segments.
@@ -36,7 +36,7 @@ There are several cases to consider compaction for segment 
optimization:
 By default, compaction does not modify the underlying data of the segments. 
However, there are cases when you may want to modify data during compaction to 
improve query performance:
 
 - If, after ingestion, you realize that data for the time interval is sparse, 
you can use compaction to increase the segment granularity.
-- Over time you don't need fine-grained granularity for older data so you want 
use compaction to change older segments to a coarser query granularity. This 
reduces the storage space required for older data. For example from `minute` to 
`hour`, or `hour` to `day`. 
+- If you don't need fine-grained granularity for older data, you can use 
compaction to change older segments to a coarser query granularity. For 
example, from `minute` to `hour` or `hour` to `day`. This reduces the storage 
space required for older data.
 - You can change the dimension order to improve sorting and reduce segment 
size.
 - You can remove unused columns in compaction or implement an aggregation 
metric for older data.
 - You can change segment rollup from dynamic partitioning with best-effort 
rollup to hash or range partitioning with perfect rollup. For more information 
on rollup, see [perfect vs best-effort 
rollup](./rollup.md#perfect-rollup-vs-best-effort-rollup).
@@ -44,9 +44,10 @@ By default, compaction does not modify the underlying data 
of the segments. Howe
 Compaction does not improve performance in all situations. For example, if you 
rewrite your data with each ingestion task, you don't need to use compaction. 
See [Segment optimization](../operations/segment-optimization.md) for 
additional guidance to determine if compaction will help in your environment.
 
 ## Types of compaction
-You can configure the Druid Coordinator to perform automatic compaction, also 
called auto-compaction, for a datasource. Using a segment search policy, the 
coordinator periodically identifies segments for compaction starting with the 
newest to oldest. When it discovers segments that have not been compacted or 
segments that were compacted with a different or changed spec, it submits 
compaction task for those segments and only those segments.
 
-Automatic compaction works in most use cases and should be your first option. 
To learn more about automatic compaction, see [Compacting 
Segments](../design/coordinator.md#compacting-segments).
+You can configure the Druid Coordinator to perform automatic compaction, also 
called auto-compaction, for a datasource. Using its [segment search 
policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction),
 the Coordinator periodically identifies segments for compaction starting from 
newest to oldest. When the Coordinator discovers segments that have not been 
compacted or segments that were compacted with a different or changed spec, it 
submits compaction tasks for th [...]
+
+Automatic compaction works in most use cases and should be your first option. 
To learn more about automatic compaction, see [Compacting 
Segments](../design/coordinator.md#automatic-compaction).
 
 In cases where you require more control over compaction, you can manually 
submit compaction tasks. For example:
 
@@ -62,7 +63,7 @@ During compaction, Druid overwrites the original set of 
segments with the compac
 You can set `dropExisting` in `ioConfig` to "true" in the compaction task to 
configure Druid to replace all existing segments fully contained by the 
interval. See the suggestion for reindexing with finer granularity under 
[Implementation considerations](native-batch.md#implementation-considerations) 
for an example.
 > WARNING: `dropExisting` in `ioConfig` is a beta feature.
 
-If an ingestion task needs to write data to a segment for a time interval 
locked for compaction, by default the ingestion task supersedes the compaction 
task and the compaction task fails without finishing. For manual compaction 
tasks you can adjust the input spec interval to avoid conflicts between 
ingestion and compaction. For automatic compaction, you can set the 
`skipOffsetFromLatest` key to adjust the auto compaction starting point from 
the current time to reduce the chance of confl [...]
+If an ingestion task needs to write data to a segment for a time interval 
locked for compaction, by default the ingestion task supersedes the compaction 
task and the compaction task fails without finishing. For manual compaction 
tasks, you can adjust the input spec interval to avoid conflicts between 
ingestion and compaction. For automatic compaction, you can set the 
`skipOffsetFromLatest` key to adjust the auto-compaction starting point from 
the current time to reduce the chance of conf [...]
 
 ### Segment granularity handling
 
@@ -82,13 +83,14 @@ If you configure query granularity in compaction to go from 
a finer granularity
 
 ### Dimension handling
 
-Apache Druid supports schema changes. Therefore, dimensions can be different 
across segments even if they are a part of the same data source. See [Different 
schemas among 
segments](../design/segments.md#different-schemas-among-segments). If the input 
segments have different dimensions, the resulting compacted segment include all 
dimensions of the input segments. 
+Apache Druid supports schema changes. Therefore, dimensions can be different 
across segments even if they are a part of the same data source. See [Different 
schemas among 
segments](../design/segments.md#different-schemas-among-segments). If the input 
segments have different dimensions, the resulting compacted segment includes 
all dimensions of the input segments.
 
 Even when the input segments have the same set of dimensions, the dimension 
order or the data type of dimensions can be different. The dimensions of recent 
segments precede that of old segments in terms of data types and the ordering 
because more recent segments are more likely to have the preferred order and 
data types.
 
 If you want to control dimension ordering or ensure specific values for 
dimension types, you can configure a custom `dimensionsSpec` in the compaction 
task spec.
 
 ### Rollup
+
 Druid only rolls up the output segment when `rollup` is set for all input 
segments.
 See [Roll-up](../ingestion/rollup.md) for more details.
 You can check that your segments are rolled up or not by using [Segment 
Metadata Queries](../querying/segmentmetadataquery.md#analysistypes).
@@ -104,6 +106,7 @@ To perform a manual compaction, you submit a compaction 
task. Compaction tasks m
     "dataSource": <task_datasource>,
     "ioConfig": <IO config>,
     "dimensionsSpec": <custom dimensionsSpec>,
+    "transformSpec": <custom transformSpec>,
     "metricsSpec": <custom metricsSpec>,
     "tuningConfig": <parallel indexing task tuningConfig>,
     "granularitySpec": <compaction task granularitySpec>,
@@ -120,14 +123,14 @@ To perform a manual compaction, you submit a compaction 
task. Compaction tasks m
 |`dimensionsSpec`|Custom `dimensionsSpec`. The compaction task uses the 
specified `dimensionsSpec` if it exists instead of generating one. See 
[Compaction dimensionsSpec](#compaction-dimensions-spec) for details.|No|
 |`transformSpec`|Custom `transformSpec`. The compaction task uses the 
specified `transformSpec` rather than using `null`. See [Compaction 
transformSpec](#compaction-transform-spec) for details.|No|
 |`metricsSpec`|Custom `metricsSpec`. The compaction task uses the specified 
`metricsSpec` rather than generating one.|No|
-|`segmentGranularity`|When set, the compaction task changes the segment 
granularity for the given interval.  Deprecated. Use `granularitySpec`. |No.|
-|`tuningConfig`|[Parallel indexing task 
tuningConfig](native-batch.md#tuningconfig). 
`awaitSegmentAvailabilityTimeoutMillis` in the tuning config is not currently 
supported for compaction tasks. Do not set it to a non-zero value.|No|
-|`context`|[Task context](./tasks.md#context)|No|
+|`segmentGranularity`|When set, the compaction task changes the segment 
granularity for the given interval.  Deprecated. Use `granularitySpec`. |No|
+|`tuningConfig`|[Parallel indexing task 
tuningConfig](native-batch.md#tuningconfig). 
`awaitSegmentAvailabilityTimeoutMillis` in the tuning config is not supported 
for compaction tasks. Leave this parameter at the default value, 0.|No|
 |`granularitySpec`|Custom `granularitySpec`. The compaction task uses the 
specified `granularitySpec` rather than generating one. See [Compaction 
`granularitySpec`](#compaction-granularity-spec) for details.|No|
+|`context`|[Task context](./tasks.md#context).|No|
 
 > Note: Use `granularitySpec` over `segmentGranularity` and only set one of 
 > these values. If you specify different values for these in the same 
 > compaction spec, the task fails.
 
-To control the number of result segments per time chunk, you can set 
[`maxRowsPerSegment`](../configuration/index.md#compaction-dynamic-configuration)
 or [`numShards`](../ingestion/native-batch.md#tuningconfig).
+To control the number of result segments per time chunk, you can set 
[`maxRowsPerSegment`](../configuration/index.md#automatic-compaction-dynamic-configuration)
 or [`numShards`](../ingestion/native-batch.md#tuningconfig).
 
 > You can run multiple compaction tasks in parallel. For example, if you want 
 > to compact the data for a year, you are not limited to running a single task 
 > for the entire year. You can run 12 compaction tasks with month-long 
 > intervals.
 
@@ -174,7 +177,7 @@ The compaction `ioConfig` requires specifying `inputSpec` 
as follows:
 |-----|-----------|-------|--------|
 |`type`|Task type: `compact`|none|Yes|
 |`inputSpec`|Specification of the target [intervals](#interval-inputspec) or 
[segments](#segments-inputspec).|none|Yes|
-|`dropExisting`|If `true` the task replaces all existing segments fully 
contained by either of the following:<br>- the `interval` in the `interval` 
type `inputSpec`.<br>- the umbrella interval of the `segments` in the `segment` 
type `inputSpec`.<br>If compaction fails, Druid does change any of the existing 
segments.<br>**WARNING**: `dropExisting` in `ioConfig` is a beta feature. 
|false|no|
+|`dropExisting`|If `true`, the task replaces all existing segments fully 
contained by either of the following:<br>- the `interval` in the `interval` 
type `inputSpec`.<br>- the umbrella interval of the `segments` in the `segment` 
type `inputSpec`.<br>If compaction fails, Druid does not change any of the 
existing segments.<br>**WARNING**: `dropExisting` in `ioConfig` is a beta 
feature. |false|No|
 
 
 Druid supports two supported `inputSpec` formats:
@@ -214,31 +217,10 @@ Druid supports two supported `inputSpec` formats:
 |`queryGranularity`|The resolution of timestamp storage within each segment. 
Defaults to 'null', which preserves the original query granularity. Accepts all 
[Query granularity](../querying/granularities.md) values.|No|
 |`rollup`|Whether to enable ingestion-time rollup or not. Defaults to 'null', 
which preserves the original setting. Note that once data is rollup, individual 
records can no longer be recovered. |No|
 
-For example, to set the segment granularity to "day", the query granularity to 
"hour", and enabling rollup:
-
-```json
-{
-  "type": "compact",
-  "dataSource": "wikipedia",
-  "ioConfig": {
-    "type": "compact",
-    "inputSpec": {
-      "type": "interval",
-      "interval": "2017-01-01/2018-01-01"
-    },
-    "granularitySpec": {
-      "segmentGranularity": "day",
-      "queryGranularity": "hour",
-      "rollup": true
-    }
-  }
-}
-```
-
 ## Learn more
 
 See the following topics for more information:
 - [Segment optimization](../operations/segment-optimization.md) for guidance 
to determine if compaction will help in your case.
-- [Compacting Segments](../design/coordinator.md#compacting-segments) for more 
on automatic compaction.
-- [Compaction Configuration 
API](../operations/api-reference.md#compaction-configuration)
-and [Compaction 
Configuration](../configuration/index.md#compaction-dynamic-configuration) for 
automatic compaction configuration information.
+- [Compacting Segments](../design/coordinator.md#automatic-compaction) for 
details on how the Coordinator manages automatic compaction.
+- [Automatic compaction configuration 
API](../operations/api-reference.md#automatic-compaction-configuration)
+and [Automatic compaction 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
 for automatic compaction configuration information.
diff --git a/docs/ingestion/tasks.md b/docs/ingestion/tasks.md
index 54b7661b01..4acece8ce5 100644
--- a/docs/ingestion/tasks.md
+++ b/docs/ingestion/tasks.md
@@ -356,6 +356,9 @@ You can override the task priority by setting your priority 
in the task context
 
 The task context is used for various individual task configuration.
 Specify task context configurations in the `context` field of the ingestion 
spec.
+When configuring [automatic 
compaction](../configuration/index.md#automatic-compaction-dynamic-configuration),
 set the task context configurations in `taskContext` rather than in `context`.
+The settings get passed into the `context` field of the compaction tasks 
issued to MiddleManagers.
+
 The following parameters apply to all task types.
 
 |property|default|description|
diff --git a/docs/operations/api-reference.md b/docs/operations/api-reference.md
index 7bdb4b4f22..e114b365ad 100644
--- a/docs/operations/api-reference.md
+++ b/docs/operations/api-reference.md
@@ -458,52 +458,52 @@ to filter by interval and limit the number of results 
respectively.
 
 Update overlord dynamic worker configuration.
 
-#### Compaction Status
+#### Automatic compaction status
 
 ##### GET
 
 * `/druid/coordinator/v1/compaction/progress?dataSource={dataSource}`
 
 Returns the total size of segments awaiting compaction for the given 
dataSource. 
-This is only valid for dataSource which has compaction enabled. 
+The specified dataSource must have automatic compaction enabled.
 
 ##### GET
 
 * `/druid/coordinator/v1/compaction/status`
 
-Returns the status and statistics from the auto compaction run of all 
dataSources which have auto compaction enabled in the latest run.
-The response payload includes a list of `latestStatus` objects. Each 
`latestStatus` represents the status for a dataSource (which has/had auto 
compaction enabled). 
+Returns the status and statistics from the auto-compaction run of all 
dataSources which have auto-compaction enabled in the latest run.
+The response payload includes a list of `latestStatus` objects. Each 
`latestStatus` represents the status for a dataSource (which has/had 
auto-compaction enabled).
 The `latestStatus` object has the following keys:
 * `dataSource`: name of the datasource for this status information
-* `scheduleStatus`: auto compaction scheduling status. Possible values are 
`NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the dataSource has an active 
auto compaction config submitted otherwise, `NOT_ENABLED`
-* `bytesAwaitingCompaction`: total bytes of this datasource waiting to be 
compacted by the auto compaction (only consider intervals/segments that are 
eligible for auto compaction)
-* `bytesCompacted`: total bytes of this datasource that are already compacted 
with the spec set in the auto compaction config.
-* `bytesSkipped`: total bytes of this datasource that are skipped (not 
eligible for auto compaction) by the auto compaction.
-* `segmentCountAwaitingCompaction`: total number of segments of this 
datasource waiting to be compacted by the auto compaction (only consider 
intervals/segments that are eligible for auto compaction)
-* `segmentCountCompacted`: total number of segments of this datasource that 
are already compacted with the spec set in the auto compaction config.
-* `segmentCountSkipped`: total number of segments of this datasource that are 
skipped (not eligible for auto compaction) by the auto compaction.
-* `intervalCountAwaitingCompaction`: total number of intervals of this 
datasource waiting to be compacted by the auto compaction (only consider 
intervals/segments that are eligible for auto compaction)
-* `intervalCountCompacted`: total number of intervals of this datasource that 
are already compacted with the spec set in the auto compaction config.
-* `intervalCountSkipped`: total number of intervals of this datasource that 
are skipped (not eligible for auto compaction) by the auto compaction.
+* `scheduleStatus`: auto-compaction scheduling status. Possible values are 
`NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the dataSource has an active 
auto-compaction config submitted. Otherwise, returns `NOT_ENABLED`.
+* `bytesAwaitingCompaction`: total bytes of this datasource waiting to be 
compacted by the auto-compaction (only consider intervals/segments that are 
eligible for auto-compaction)
+* `bytesCompacted`: total bytes of this datasource that are already compacted 
with the spec set in the auto-compaction config
+* `bytesSkipped`: total bytes of this datasource that are skipped (not 
eligible for auto-compaction) by the auto-compaction
+* `segmentCountAwaitingCompaction`: total number of segments of this 
datasource waiting to be compacted by the auto-compaction (only consider 
intervals/segments that are eligible for auto-compaction)
+* `segmentCountCompacted`: total number of segments of this datasource that 
are already compacted with the spec set in the auto-compaction config
+* `segmentCountSkipped`: total number of segments of this datasource that are 
skipped (not eligible for auto-compaction) by the auto-compaction
+* `intervalCountAwaitingCompaction`: total number of intervals of this 
datasource waiting to be compacted by the auto-compaction (only consider 
intervals/segments that are eligible for auto-compaction)
+* `intervalCountCompacted`: total number of intervals of this datasource that 
are already compacted with the spec set in the auto-compaction config
+* `intervalCountSkipped`: total number of intervals of this datasource that 
are skipped (not eligible for auto-compaction) by the auto-compaction
 
 ##### GET
 
 * `/druid/coordinator/v1/compaction/status?dataSource={dataSource}`
 
 Similar to the API `/druid/coordinator/v1/compaction/status` above but filters 
response to only return information for the {dataSource} given. 
-Note that {dataSource} given must have/had auto compaction enabled.
+Note that {dataSource} given must have/had auto-compaction enabled.
 
-#### Compaction Configuration
+#### Automatic compaction configuration
 
 ##### GET
 
 * `/druid/coordinator/v1/config/compaction`
 
-Returns all compaction configs.
+Returns all automatic compaction configs.
 
 * `/druid/coordinator/v1/config/compaction/{dataSource}`
 
-Returns a compaction config of a dataSource.
+Returns an automatic compaction config of a dataSource.
 
 ##### POST
 
@@ -517,15 +517,15 @@ will be set for them.
 
 * `/druid/coordinator/v1/config/compaction`
 
-Creates or updates the compaction config for a dataSource.
-See [Compaction 
Configuration](../configuration/index.md#compaction-dynamic-configuration) for 
configuration details.
+Creates or updates the automatic compaction config for a dataSource.
+See [Automatic compaction dynamic 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
 for configuration details.
 
 
 ##### DELETE
 
 * `/druid/coordinator/v1/config/compaction/{dataSource}`
 
-Removes the compaction config for a dataSource.
+Removes the automatic compaction config for a dataSource.
 
 #### Server information
 
diff --git a/docs/operations/segment-optimization.md 
b/docs/operations/segment-optimization.md
index f79af5ea8f..93229c40b4 100644
--- a/docs/operations/segment-optimization.md
+++ b/docs/operations/segment-optimization.md
@@ -1,6 +1,6 @@
 ---
 id: segment-optimization
-title: "Segment Size Optimization"
+title: "Segment size optimization"
 ---
 
 <!--
@@ -87,11 +87,11 @@ In this case, you may want to see only rows of the max 
version per interval (pai
 
 Once you find your segments need compaction, you can consider the below two 
options:
 
-  - Turning on the [automatic compaction of 
Coordinators](../design/coordinator.md#compacting-segments).
+  - Turning on the [automatic compaction of 
Coordinators](../design/coordinator.md#automatic-compaction).
   The Coordinator periodically submits [compaction 
tasks](../ingestion/tasks.md#compact) to re-index small segments.
   To enable the automatic compaction, you need to configure it for each 
dataSource via Coordinator's dynamic configuration.
-  See [Compaction Configuration 
API](../operations/api-reference.md#compaction-configuration)
-  and [Compaction 
Configuration](../configuration/index.md#compaction-dynamic-configuration) for 
details.
+  See [Automatic compaction configuration 
API](../operations/api-reference.md#automatic-compaction-configuration)
+  and [Automatic compaction dynamic 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration)
 for details.
   - Running periodic Hadoop batch ingestion jobs and using a `dataSource`
   inputSpec to read from the segments generated by the Kafka indexing tasks. 
This might be helpful if you want to compact a lot of segments in parallel.
   Details on how to do this can be found on the [Updating existing 
data](../ingestion/data-management.md#update) section
diff --git a/website/.spelling b/website/.spelling
index 89ad85dd32..d23986e9a0 100644
--- a/website/.spelling
+++ b/website/.spelling
@@ -55,6 +55,7 @@ DRUIDVERSION
 DataSketches
 DateTime
 DateType
+dimensionsSpec
 DimensionSpec
 DimensionSpecs
 Dockerfile
@@ -112,6 +113,7 @@ InputFormat
 InputSource
 InputSources
 Integer.MAX_VALUE
+ioConfig
 JBOD
 JDBC
 JDK
@@ -671,7 +673,6 @@ baseDataSource
 baseDataSource-hashCode
 classpathPrefix
 derivativeDataSource
-dimensionsSpec
 druid.extensions.hadoopDependenciesDir
 hadoopDependencyCoordinates
 maxTaskCount
@@ -1132,7 +1133,6 @@ datetime
 f.example.com
 filePattern
 forceExtendableShardSpecs
-granularitySpec
 ignoreInvalidRows
 ignoreWhenNoSegments
 indexSpecForIntermediatePersists
@@ -1842,7 +1842,6 @@ cpuacct
 dataSourceName
 datetime
 defaultHistory
-dimensionsSpec
 doubleMax
 doubleMin
 doubleSum


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[druid] branch master updated: Update automatic compaction docs with consistent terminology (#12416)

Reply via email to