jihoonson commented on issue #9712: URL: https://github.com/apache/druid/issues/9712#issuecomment-619165839
@yuanlihan your assessment is very correct! Yes, the coordinator should be able to run both minor compaction and major compactions; minor compaction for recent data and major compaction for old data. As you mentioned, minor compaction should be able to run on a subset of segments in a time chunk instead of grabbing all of them. > It would be better to only do minor compaction for the first M small segments and the tail N small segments. This sounds nice, but I'm not sure how we can do it. Auto compaction algorithm used to use segment size as a trigger for compaction, but this caused a bunch of bugs since the segment size after compaction can be still small based on your configuration such as maxRowsPerSegment. Also parallel task will create at least one small segment in most cases since the last task will be likely assigned small number of segments. As a result, we changed the algorithm to be stateful in https://github.com/apache/druid/pull/8573. Do you have a good idea? > Also found that minor compaction tasks will always fail if the partitionId is not consecutive. > > > WARN [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Exception thrown during isReady for task: compact_ds_name_pfjceoge_2020-04-20T03:25:14.299Z > > org.apache.druid.java.util.common.ISE: Can't compact segments of non-consecutive rootPartition range Oh yeah, this is a known issue. Just opened https://github.com/apache/druid/issues/9768. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
