kfaraz commented on code in PR #15994:
URL: https://github.com/apache/druid/pull/15994#discussion_r1519397596
##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java:
##########
@@ -207,20 +226,17 @@ public TaskStatus runTask(TaskToolbox toolbox) throws
Exception
@Nullable Integer numTotalBatches = getNumTotalBatches();
List<DataSegment> unusedSegments;
LOG.info(
- "Starting kill for datasource[%s] in interval[%s] with batchSize[%d],
up to limit[%d] segments "
- + "before maxUsedStatusLastUpdatedTime[%s] will be deleted%s",
- getDataSource(),
- getInterval(),
- batchSize,
- limit,
- maxUsedStatusLastUpdatedTime,
+ "Starting kill for datasource[%s] in interval[%s] and versions[%s]
with batchSize[%d], up to limit[%d]"
+ + " segments before maxUsedStatusLastUpdatedTime[%s] will be
deleted%s",
+ getDataSource(), getInterval(), getVersions(), batchSize, limit,
maxUsedStatusLastUpdatedTime,
numTotalBatches != null ? StringUtils.format(" in [%d] batches.",
numTotalBatches) : "."
);
RetrieveUsedSegmentsAction retrieveUsedSegmentsAction = new
RetrieveUsedSegmentsAction(
getDataSource(),
null,
ImmutableList.of(getInterval()),
+ getVersions(),
Review Comment:
Sure, @abhishekrb19 .
So, we fetch the set of used segments here to ensure that we do not end up
killing a segment whose load spec is still in use.
As a result of the segment upgrade logic introduced in PR #14407 , there can
be multiple segment IDs belonging to different versions that refer to the same
load spec i.e. the same segment folder on deep storage. So if a segment ID
belong to version0 is now unused, the actual physical segment may still be
needed by some other segment ID which belongs to version1.
Hope that clarifies things. In conclusion, we shouldn't need to specify
versions while retrieving __used__ segments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]