yuanlihan commented on issue #9712:
URL: https://github.com/apache/druid/issues/9712#issuecomment-620045099


   Hi, @jihoonson 
   
   I like the stateful algorithm for auto compaction.
   
   > This sounds nice, but I'm not sure how we can do it. Auto compaction 
algorithm used to use segment size as a trigger for compaction, but this caused 
a bunch of bugs since the segment size after compaction can be still small 
based on your configuration such as maxRowsPerSegment. Also parallel task will 
create at least one small segment in most cases since the last task will be 
likely assigned small number of segments. As a result, we changed the algorithm 
to be stateful in #8573. Do you have a good idea?
   
   I would like to introduce two extra properties to 
`DataSourceCompactionConfig`:
   | Property                            | Description                          
                        | Default |
   | ----------------------------------- | 
------------------------------------------------------------ | ------- |
   | skipSegmentWithSizeBytesGreaterThan | filter out segments with size 
greater than this value. Specify this property if you want to use minor 
compaction. | null    |
   | minNumSegmentsToCompact             | Minimum number of segments to 
trigger compaction. Suggest set this value to `>=2` if you want to use minor 
compaction. | 1       |
   
   First filter out big segments from candidates(`SegmentsToCompact`) then find 
a subset of consecutive segments and then create a minor compaction task with 
`segment` type  `inputSpec`. In the meanwhile, it relieves the impact of 
non-consecutive segments. 
   Since the type of `partitionSpec` of minor compaction task should be 
`dynamic` only(is it?), we can adjust `maxRowsPerSegment` if segments created 
by minor compaction tasks is still too small.
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to