ektravel commented on code in PR #12416: URL: https://github.com/apache/druid/pull/12416#discussion_r861017549
########## docs/ingestion/compaction.md: ########## @@ -28,23 +28,23 @@ Query performance in Apache Druid depends on optimally sized segments. Compactio There are several cases to consider compaction for segment optimization: -- With streaming ingestion, data can arrive out of chronological order creating lots of small segments. +- With streaming ingestion, data can arrive out of chronological order creating many small segments. - If you append data using `appendToExisting` for [native batch](native-batch.md) ingestion creating suboptimal segments. - When you use `index_parallel` for parallel batch indexing and the parallel ingestion tasks create many small segments. - When a misconfigured ingestion task creates oversized segments. By default, compaction does not modify the underlying data of the segments. However, there are cases when you may want to modify data during compaction to improve query performance: - If, after ingestion, you realize that data for the time interval is sparse, you can use compaction to increase the segment granularity. -- Over time you don't need fine-grained granularity for older data so you want use compaction to change older segments to a coarser query granularity. This reduces the storage space required for older data. For example from `minute` to `hour`, or `hour` to `day`. +- Over time you don't need fine-grained granularity for older data so you want use compaction to change older segments to a coarser query granularity. This reduces the storage space required for older data. For example from `minute` to `hour`, or `hour` to `day`. Review Comment: ```suggestion - If you don't need fine-grained granularity for older data, you can use compaction to change older segments to a coarser query granularity. For example, from `minute` to `hour` or `hour` to `day`. This reduces the storage space required for older data. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
