jihoonson opened a new issue #10999:
URL: https://github.com/apache/druid/issues/10999


   ### Affected Version
   
   The master branch
   
   ### Description
   
   #10843 added support for segment granularity for auto compaction. This 
change can make auto compaction to fail in finding candidate segments for 
compaction when those segments have mixed versions, especially when you change 
segment granularity from something small to something large. When you change 
segment granularity, auto compaction internally creates another timeline which 
is populated based on the new segment granularity. Here is a code snippet of 
[how we populate the new 
timeline](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/duty/NewestSegmentFirstIterator.java#L135-L148).
   
   ```java
                 DataSegment segmentsForCompact = segment.withShardSpec(new 
NumberedShardSpec(partitionNum, partitions));
                 // PartitionHolder can only holds chunks of one partition space
                 // However, partition in the new timeline 
(timelineWithConfiguredSegmentGranularity) can be hold multiple
                 // partitions of the original timeline (when the new 
segmentGranularity is larger than the original
                 // segmentGranularity). Hence, we group all the segments of 
the original timeline into intervals bucket
                 // by the new configuredSegmentGranularity. We then convert 
each segment into a new partition space so that
                 // there is no duplicate partitionNum across all segments of 
each new Interval. We will have to save the
                 // original ShardSpec to convert the segment back when 
returning from the iterator.
                 originalShardSpecs.put(new Pair<>(interval, 
segmentsForCompact.getId()), segment.getShardSpec());
                 timelineWithConfiguredSegmentGranularity.add(
                     interval,
                     segmentsForCompact.getVersion(),
                     NumberedPartitionChunk.make(partitionNum, partitions, 
segmentsForCompact)
                 );
   ```
   
   As shown in the snippet, we use the segment version directly when populating 
the new timeline. Since the `interval` in the snippet is a time chunk based on 
new segment granularity, those segments of mixed versions can be added into the 
same time chunk in the new timeline. Finally, we replace the shardSpec of those 
segments with a new one that has `partitions` of the number of segments in the 
new time chunk. As a result, all those segments of mixed versions will not be 
visible since there will be always less number of non-overshadowed segments 
than `partitions` in that time chunk.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to