[GitHub] [druid] 2bethere commented on a change in pull request #10935: First refactor of compaction

GitBox Fri, 05 Mar 2021 13:53:02 -0800


2bethere commented on a change in pull request #10935:
URL: https://github.com/apache/druid/pull/10935#discussion_r588735377




##########
File path: docs/configuration/index.md
##########
@@ -820,24 +820,24 @@ A description of the compaction config is:
 |`taskPriority`|[Priority](../ingestion/tasks.md#priority) of compaction 
task.|no (default = 25)|
 |`inputSegmentSizeBytes`|Maximum number of total segment bytes processed per 
compaction task. Since a time chunk must be processed in its entirety, if the 
segments for a particular time chunk have a total size in bytes greater than 
this parameter, compaction will not run for that time chunk. Because each 
compaction task runs with a single thread, setting this value too far above 
1–2GB will result in compaction tasks taking an excessive amount of time.|no 
(default = 419430400)|
 |`maxRowsPerSegment`|Max number of rows per segment after compaction.|no|
-|`skipOffsetFromLatest`|The offset for searching segments to be compacted. 
Strongly recommended to set for realtime dataSources. |no (default = "P1D")|
+|`skipOffsetFromLatest`|The offset for searching segments to be compacted in 
[ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) duration format. Strongly 
recommended to set for realtime dataSources. See [Data handling with 
compaction](../ingestion/compaction.md#data-handling-with-compaction)|no 
(default = "P1D")|
 |`tuningConfig`|Tuning config for compaction tasks. See below [Compaction Task 
TuningConfig](#compaction-tuningconfig).|no|
 |`taskContext`|[Task context](../ingestion/tasks.md#context) for compaction 
tasks.|no|
+|`granularitySpec`|Custom `granularitySpec` to describe the 
`segmentGranularity` for the compacted segments.|No|
 
 An example of compaction config is:
 
 ```json
 {
-  "dataSource": "wikiticker"
+  "dataSource": "wikiticker",
+  "granularitySpec" : {
+    "segmentGranularity : "none"
+  }
 }
 ```
 
-Note that compaction tasks can fail if their locks are revoked by other tasks 
of higher priorities.
-Since realtime tasks have a higher priority than compaction task by default,
-it can be problematic if there are frequent conflicts between compaction tasks 
and realtime tasks.
-If this is the case, the coordinator's automatic compaction might get stuck 
because of frequent compaction task failures.
-This kind of problem may happen especially in Kafka/Kinesis indexing systems 
which allow late data arrival.
-If you see this problem, it's recommended to set `skipOffsetFromLatest` to 
some large enough value to avoid such conflicts between compaction tasks and 
realtime tasks.
+Compaction tasks fail when higher priority tasks cause Druid to revokes their 
locks. By default, realtime tasks like ingestion have a higher priority than 
compaction tasks. Therefore frequent conflicts between compaction tasks and 
realtime tasks can cause the coordinator's automatic compaction to get stuck.

Review comment:
       Nice! This is way clearer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] 2bethere commented on a change in pull request #10935: First refactor of compaction

Reply via email to