maytasm commented on code in PR #12416: URL: https://github.com/apache/druid/pull/12416#discussion_r863290851
########## docs/ingestion/compaction.md: ########## @@ -28,23 +28,23 @@ Query performance in Apache Druid depends on optimally sized segments. Compactio There are several cases to consider compaction for segment optimization: -- With streaming ingestion, data can arrive out of chronological order creating lots of small segments. +- With streaming ingestion, data can arrive out of chronological order creating many small segments. - If you append data using `appendToExisting` for [native batch](native-batch.md) ingestion creating suboptimal segments. - When you use `index_parallel` for parallel batch indexing and the parallel ingestion tasks create many small segments. - When a misconfigured ingestion task creates oversized segments. By default, compaction does not modify the underlying data of the segments. However, there are cases when you may want to modify data during compaction to improve query performance: - If, after ingestion, you realize that data for the time interval is sparse, you can use compaction to increase the segment granularity. -- Over time you don't need fine-grained granularity for older data so you want use compaction to change older segments to a coarser query granularity. This reduces the storage space required for older data. For example from `minute` to `hour`, or `hour` to `day`. +- If you don't need fine-grained granularity for older data, you can use compaction to change older segments to a coarser query granularity. For example, from `minute` to `hour` or `hour` to `day`. This reduces the storage space required for older data. - You can change the dimension order to improve sorting and reduce segment size. - You can remove unused columns in compaction or implement an aggregation metric for older data. - You can change segment rollup from dynamic partitioning with best-effort rollup to hash or range partitioning with perfect rollup. For more information on rollup, see [perfect vs best-effort rollup](./rollup.md#perfect-rollup-vs-best-effort-rollup). Compaction does not improve performance in all situations. For example, if you rewrite your data with each ingestion task, you don't need to use compaction. See [Segment optimization](../operations/segment-optimization.md) for additional guidance to determine if compaction will help in your environment. ## Types of compaction -You can configure the Druid Coordinator to perform automatic compaction, also called auto-compaction, for a datasource. Using a segment search policy, the coordinator periodically identifies segments for compaction starting with the newest to oldest. When it discovers segments that have not been compacted or segments that were compacted with a different or changed spec, it submits compaction task for those segments and only those segments. +You can configure the Druid Coordinator to perform automatic compaction, also called auto-compaction, for a datasource. Using a segment search policy, the Coordinator periodically identifies segments for compaction starting from newest to oldest. When the Coordinator discovers segments that have not been compacted or segments that were compacted with a different or changed spec, it submits compaction tasks for only those segments. Review Comment: Actually, to be technically correct, it submits compaction tasks for the interval covering those segments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
