vtlim commented on a change in pull request #12362:
URL: https://github.com/apache/druid/pull/12362#discussion_r833743248
##########
File path: docs/ingestion/compaction.md
##########
@@ -46,34 +49,39 @@ You can configure the Druid Coordinator to perform
automatic compaction, also ca
Automatic compaction works in most use cases and should be your first option.
To learn more about automatic compaction, see [Compacting
Segments](../design/coordinator.md#compacting-segments).
In cases where you require more control over compaction, you can manually
submit compaction tasks. For example:
+
- Automatic compaction is running into the limit of task slots available to
it, so tasks are waiting for previous automatic compaction tasks to complete.
Manual compaction can use all available task slots, therefore you can complete
compaction more quickly by submitting more concurrent tasks for more intervals.
- You want to force compaction for a specific time range or you want to
compact data out of chronological order.
See [Setting up a manual compaction task](#setting-up-manual-compaction) for
more about manual compaction tasks.
## Data handling with compaction
+
During compaction, Druid overwrites the original set of segments with the
compacted set. Druid also locks the segments for the time interval being
compacted to ensure data consistency. By default, compaction tasks do not
modify the underlying data. You can configure the compaction task to change the
query granularity or add or remove dimensions in the compaction task. This
means that the only changes to query results should be the result of
intentional, not automatic, changes.
You can set `dropExisting` in `ioConfig` to "true" in the compaction task to
configure Druid to replace all existing segments fully contained by the
interval. See the suggestion for reindexing with finer granularity under
[Implementation considerations](native-batch.md#implementation-considerations)
for an example.
-> WARNING: `dropExisting` in `ioConfig` is a beta feature.
+> WARNING: `dropExisting` in `ioConfig` is a beta feature.
If an ingestion task needs to write data to a segment for a time interval
locked for compaction, by default the ingestion task supersedes the compaction
task and the compaction task fails without finishing. For manual compaction
tasks you can adjust the input spec interval to avoid conflicts between
ingestion and compaction. For automatic compaction, you can set the
`skipOffsetFromLatest` key to adjust the auto compaction starting point from
the current time to reduce the chance of conflicts between ingestion and
compaction. See [Compaction dynamic
configuration](../configuration/index.md#compaction-dynamic-configuration) for
more information. Another option is to set the compaction task to higher
priority than the ingestion task.
### Segment granularity handling
-Unless you modify the segment granularity in the [granularity
spec](#compaction-granularity-spec), Druid attempts to retain the granularity
for the compacted segments. When segments have different segment granularities
with no overlap in interval Druid creates a separate compaction task for each
to retain the segment granularity in the compacted segment.
+Unless you modify the segment granularity in
[`granularitySpec`](#compaction-granularity-spec), Druid attempts to retain the
granularity for the compacted segments. When segments have different segment
granularities with no overlap in interval Druid creates a separate compaction
task for each to retain the segment granularity in the compacted segment.
+
+If segments have different segment granularities before compaction but there
is some overlap in interval, Druid attempts find start and end of the
overlapping interval and uses the closest segment granularity level for the
compacted segment.
-If segments have different segment granularities before compaction but there
is some overlap in interval, Druid attempts find start and end of the
overlapping interval and uses the closest segment granularity level for the
compacted segment. For example consider two overlapping segments: segment "A"
for the interval 01/01/2021-01/02/2021 with day granularity and segment "B" for
the interval 01/01/2021-02/01/2021. Druid attempts to combine and compacted the
overlapped segments. In this example, the earliest start time for the two
segments above is 01/01/2020 and the latest end time of the two segments above
is 02/01/2020. Druid compacts the segments together even though they have
different segment granularity. Druid uses month segment granularity for the
newly compacted segment even though segment A's original segment granularity
was DAY.
+For example consider two overlapping segments: segment "A" for the interval
01/01/2021-01/02/2021 with day granularity and segment "B" for the interval
01/01/2021-02/01/2021. Druid attempts to combine and compacted the overlapped
segments. In this example, the earliest start time for the two segments above
is 01/01/2020 and the latest end time of the two segments above is 02/01/2020.
Druid compacts the segments together even though they have different segment
granularity. Druid uses month segment granularity for the newly compacted
segment even though segment A's original segment granularity was DAY.
Review comment:
```suggestion
For example consider two overlapping segments: segment "A" for the interval
01/01/2021-01/02/2021 with day granularity and segment "B" for the interval
01/01/2021-02/01/2021. Druid attempts to combine and compact the overlapped
segments. In this example, the earliest start time for the two segments is
01/01/2020 and the latest end time of the two segments is 02/01/2020. Druid
compacts the segments together even though they have different segment
granularity. Druid uses month segment granularity for the newly compacted
segment even though segment A's original segment granularity was DAY.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]