gianm commented on PR #18939:
URL: https://github.com/apache/druid/pull/18939#issuecomment-3806730385

   > for example, the compaction now performs compaction on a whole interval, 
if there're many segments, it takes very long time(and sometimes it's not 
realistic to complete) to finish the job. This kind of compaction was 
originally named as 'Majar compaction', while a 'minor compaction' is there to 
allow us to compact given segments, but it's buggy now, even we give 2 segments 
for example, the task will still fetch all segments in that interval.
   
   As to minor compaction, IMO it should come back and use the mechanisms that 
exist for concurrent compaction in order to carry-through already-compacted 
segments unchanged (new version, same fie on s3) while only performing the 
physical compaction operation on the new segments.
   
   It could be a reshuffle of the new segments, which would require some 
evolving of the shard specs to be able to attach pruning metadata outside the 
core set. It could also be a simple merge with no shuffle, which wouldn't be as 
good for pruning but would execute much more quickly. I lean towards a simple 
merge being a good idea, with major compaction being used to attach the pruning 
information.
   
   I believe an in depth discussion of minor compaction is off-topic for this 
PR, though, except to acknowledge that if we are having a unified concept of 
compaction and reindexing, then that unified concept should be broad enough to 
include minor compaction down the road.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to