yuanlihan commented on issue #9712:
URL: https://github.com/apache/druid/issues/9712#issuecomment-620045099
Hi, @jihoonson
I like the stateful algorithm for auto compaction.
> This sounds nice, but I'm not sure how we can do it. Auto compaction
algorithm used to use segment size as a trigger for compaction, but this caused
a bunch of bugs since the segment size after compaction can be still small
based on your configuration such as maxRowsPerSegment. Also parallel task will
create at least one small segment in most cases since the last task will be
likely assigned small number of segments. As a result, we changed the algorithm
to be stateful in #8573. Do you have a good idea?
I would like to introduce two extra properties to
`DataSourceCompactionConfig`:
| Property | Description
| Default |
| ----------------------------------- |
------------------------------------------------------------ | ------- |
| skipSegmentWithSizeBytesGreaterThan | filter out segments with size
greater than this value. Specify this property if you want to use minor
compaction. | null |
| minNumSegmentsToCompact | Minimum number of segments to
trigger compaction. Suggest set this value to `>=2` if you want to use minor
compaction. | 1 |
First filter out big segments from candidates(`SegmentsToCompact`) then find
a subset of consecutive segments and then create a minor compaction task with
`segment` type `inputSpec`. In the meanwhile, it relieves the impact of
non-consecutive segments.
Since the type of `partitionSpec` of minor compaction task should be
`dynamic` only(is it?), we can adjust `maxRowsPerSegment` if segments created
by minor compaction tasks is still too small.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]