[GitHub] [druid] maytasm commented on a diff in pull request #12416: Update automatic compaction docs with consistent terminology

GitBox Mon, 02 May 2022 17:50:00 -0700


maytasm commented on code in PR #12416:
URL: https://github.com/apache/druid/pull/12416#discussion_r863290851



##########
docs/ingestion/compaction.md:
##########
@@ -28,23 +28,23 @@ Query performance in Apache Druid depends on optimally 
sized segments. Compactio
 
 There are several cases to consider compaction for segment optimization:
 
-- With streaming ingestion, data can arrive out of chronological order 
creating lots of small segments.
+- With streaming ingestion, data can arrive out of chronological order 
creating many small segments.
 - If you append data using `appendToExisting` for [native 
batch](native-batch.md) ingestion creating suboptimal segments.
 - When you use `index_parallel` for parallel batch indexing and the parallel 
ingestion tasks create many small segments.
 - When a misconfigured ingestion task creates oversized segments.
 
 By default, compaction does not modify the underlying data of the segments. 
However, there are cases when you may want to modify data during compaction to 
improve query performance:
 
 - If, after ingestion, you realize that data for the time interval is sparse, 
you can use compaction to increase the segment granularity.
-- Over time you don't need fine-grained granularity for older data so you want 
use compaction to change older segments to a coarser query granularity. This 
reduces the storage space required for older data. For example from `minute` to 
`hour`, or `hour` to `day`. 
+- If you don't need fine-grained granularity for older data, you can use 
compaction to change older segments to a coarser query granularity. For 
example, from `minute` to `hour` or `hour` to `day`. This reduces the storage 
space required for older data.
 - You can change the dimension order to improve sorting and reduce segment 
size.
 - You can remove unused columns in compaction or implement an aggregation 
metric for older data.
 - You can change segment rollup from dynamic partitioning with best-effort 
rollup to hash or range partitioning with perfect rollup. For more information 
on rollup, see [perfect vs best-effort 
rollup](./rollup.md#perfect-rollup-vs-best-effort-rollup).
 
 Compaction does not improve performance in all situations. For example, if you 
rewrite your data with each ingestion task, you don't need to use compaction. 
See [Segment optimization](../operations/segment-optimization.md) for 
additional guidance to determine if compaction will help in your environment.
 
 ## Types of compaction
-You can configure the Druid Coordinator to perform automatic compaction, also 
called auto-compaction, for a datasource. Using a segment search policy, the 
coordinator periodically identifies segments for compaction starting with the 
newest to oldest. When it discovers segments that have not been compacted or 
segments that were compacted with a different or changed spec, it submits 
compaction task for those segments and only those segments.
+You can configure the Druid Coordinator to perform automatic compaction, also 
called auto-compaction, for a datasource. Using a segment search policy, the 
Coordinator periodically identifies segments for compaction starting from 
newest to oldest. When the Coordinator discovers segments that have not been 
compacted or segments that were compacted with a different or changed spec, it 
submits compaction tasks for only those segments.

Review Comment:
   Actually, to be technically correct, it submits compaction tasks for the 
interval covering those segments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] maytasm commented on a diff in pull request #12416: Update automatic compaction docs with consistent terminology

Reply via email to