gianm commented on a change in pull request #6782: Improve doc for auto 
compaction
URL: https://github.com/apache/incubator-druid/pull/6782#discussion_r246176400
 
 

 ##########
 File path: docs/content/design/coordinator.md
 ##########
 @@ -86,17 +86,30 @@ Once a compact task fails, the coordinator simply finds 
the segments for the int
 
 #### Newest Segment First Policy
 
-This policy searches the segments of _all dataSources_ in inverse order of 
their intervals.
-For example, let me assume there are 3 dataSources (`ds1`, `ds2`, `ds3`) and 5 
segments (`seg_ds1_2017-10-01_2017-10-02`, `seg_ds1_2017-11-01_2017-11-02`, 
`seg_ds2_2017-08-01_2017-08-02`, `seg_ds3_2017-07-01_2017-07-02`, 
`seg_ds3_2017-12-01_2017-12-02`) for those dataSources.
-The segment name indicates its dataSource and interval. The search result of 
newestSegmentFirstPolicy is [`seg_ds3_2017-12-01_2017-12-02`, 
`seg_ds1_2017-11-01_2017-11-02`, `seg_ds1_2017-10-01_2017-10-02`, 
`seg_ds2_2017-08-01_2017-08-02`, `seg_ds3_2017-07-01_2017-07-02`].
-
-Every run, this policy starts searching from the (very latest interval - 
[skipOffsetFromLatest](../configuration/index.html#compaction-dynamic-configuration)).
-This is to handle the late segments ingested to realtime dataSources.
+At every coordinator run, this policy searches segments to compact by 
iterating segments from the latest to the oldest.
+Once it finds the latest segment among all dataSources, it checks the segment 
is _compactible_ with other segments of the same dataSource which have the same 
or abutting intervals.
+Note that segments are compactible if their total size is smaller than or 
equal to the configured `inputSegmentSizeBytes`.
+
+Here are some details with example. Let us assume we have two dataSources 
(`foo`, `bar`)
+and 5 segments (`foo_2017-10-01/2017-11-01`, `foo_2017-11-01/2017-12-01`, 
`bar_2017-08-01/2017-09-01`, `bar_2017-09-01/2017-10-01`, 
`bar_2017-10-01/2017-11-01`).
+The segment name indicates its dataSource and interval, and each segment has 
the same size of 10 MB.
+When `inputSegmentSizeBytes` is 20 MB, this policy first returns two segments 
(`foo_2017-11-01/2017-12-01` and `foo_2017-10-01/2017-11-01`) to compact because
+they are the latest segment and its abutting segment, and their total size is 
equal to `inputSegmentSizeBytes`.
+
+If the coordinator has enough task slots for compaction, this policy would 
continue searching the next segments and return
 
 Review comment:
   searching for

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to