AmatyaAvadhanula commented on code in PR #16667:
URL: https://github.com/apache/druid/pull/16667#discussion_r1677290200
##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java:
##########
@@ -300,6 +356,81 @@ private NavigableMap<DateTime, List<TaskLock>>
getNonRevokedTaskLockMap(TaskActi
return taskLockMap;
}
+ private List<DataSegment> findUnreferencedSegments(
+ List<DataSegment> unusedSegments,
+ Map<String, String> upgradedFromSegmentIds,
+ TaskActionClient taskActionClient
+ ) throws IOException
+ {
+ if (unusedSegments.isEmpty() || upgradedFromSegmentIds.isEmpty()) {
+ return unusedSegments;
+ }
+
+ final Set<String> upgradedSegmentIds = new HashSet<>();
+ for (DataSegment segment : unusedSegments) {
+ final String id = segment.getId().toString();
+ upgradedSegmentIds.add(id);
+ if (upgradedFromSegmentIds.containsKey(id)) {
+ upgradedSegmentIds.add(upgradedFromSegmentIds.get(id));
+ }
Review Comment:
I've modified the method you suggested.
##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTask.java:
##########
@@ -231,29 +252,64 @@ public TaskStatus runTask(TaskToolbox toolbox) throws
Exception
);
}
- // Kill segments
- // Order is important here: we want the nuke action to clean up the
metadata records _before_ the
- // segments are removed from storage, this helps maintain that we will
always have a storage segment if
- // the metadata segment is present. If the segment nuke throws an
exception, then the segment cleanup is
- // abandoned.
+ // Kill segments. Order is important here:
+ // Retrieve the segment upgrade infos for the batch _before_ the
segments are nuked
+ // We then want the nuke action to clean up the metadata records
_before_ the segments are removed from storage.
+ // This helps maintain that we will always have a storage segment if the
metadata segment is present.
+ // Determine the subset of segments to be killed from deep storage based
on loadspecs.
+ // If the segment nuke throws an exception, then the segment cleanup is
abandoned.
+
+ // Determine upgraded segment ids before nuking
+ final Set<String> segmentIds = unusedSegments.stream()
+ .map(DataSegment::getId)
+ .map(SegmentId::toString)
+
.collect(Collectors.toSet());
+ final Map<String, String> upgradedFromSegmentIds = new HashMap<>();
+ try {
+ upgradedFromSegmentIds.putAll(
+ taskActionClient.submit(
+ new RetrieveUpgradedFromSegmentIdsAction(getDataSource(),
segmentIds)
+ ).getUpgradedFromSegmentIds()
+ );
+ }
+ catch (Exception e) {
+ LOG.warn(
+ e,
+ "Could not retrieve UpgradedFromSegmentsResponse using task
action[retrieveUpgradedFromSegmentIds]."
+ + " Overlord may be on an older version."
+ );
+ }
+
+ // Nuke Segments
+ taskActionClient.submit(new SegmentNukeAction(new
HashSet<>(unusedSegments)));
- toolbox.getTaskActionClient().submit(new SegmentNukeAction(new
HashSet<>(unusedSegments)));
+ // Determine unreferenced segments
+ final List<DataSegment> unreferencedSegments
+ = findUnreferencedSegments(unusedSegments, upgradedFromSegmentIds,
taskActionClient);
- // Kill segments from the deep storage only if their load specs are not
being used by any used segments
- final List<DataSegment> segmentsToBeKilled = unusedSegments
+ // Kill segments from the deep storage only if their load specs are not
being used by any other segments
+ // We still need to check with used load specs as segment upgrades were
introduced before this change
+ final List<DataSegment> segmentsToBeKilled = unreferencedSegments
.stream()
.filter(unusedSegment -> unusedSegment.getLoadSpec() == null
||
!usedSegmentLoadSpecs.contains(unusedSegment.getLoadSpec()))
.collect(Collectors.toList());
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]