gianm commented on PR #18939: URL: https://github.com/apache/druid/pull/18939#issuecomment-3806730385
> for example, the compaction now performs compaction on a whole interval, if there're many segments, it takes very long time(and sometimes it's not realistic to complete) to finish the job. This kind of compaction was originally named as 'Majar compaction', while a 'minor compaction' is there to allow us to compact given segments, but it's buggy now, even we give 2 segments for example, the task will still fetch all segments in that interval. As to minor compaction, IMO it should come back and use the mechanisms that exist for concurrent compaction in order to carry-through already-compacted segments unchanged (new version, same fie on s3) while only performing the physical compaction operation on the new segments. It could be a reshuffle of the new segments, which would require some evolving of the shard specs to be able to attach pruning metadata outside the core set. It could also be a simple merge with no shuffle, which wouldn't be as good for pruning but would execute much more quickly. I lean towards a simple merge being a good idea, with major compaction being used to attach the pruning information. I believe an in depth discussion of minor compaction is off-topic for this PR, though, except to acknowledge that if we are having a unified concept of compaction and reindexing, then that unified concept should be broad enough to include minor compaction down the road. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
