machine424 commented on issue #6816: URL: https://github.com/apache/druid/issues/6816#issuecomment-1004840022
Hello @jihoonson, I think the change you suggested `Also, the order of killing segments should be changed: the metadata store must be updated first before removing the segment file.` was made by you here: https://github.com/apache/druid/commit/db149462073d59e7563f0d3834e69d44a2bb4011. Now, on maqter, the code looks like: ```java final List<DataSegment> unusedSegments = toolbox .getTaskActionClient() .submit(new RetrieveUnusedSegmentsAction(getDataSource(), getInterval())); if (!TaskLocks.isLockCoversSegments(taskLockMap, unusedSegments)) { throw new ISE( "Locks[%s] for task[%s] can't cover segments[%s]", taskLockMap.values().stream().flatMap(List::stream).collect(Collectors.toList()), getId(), unusedSegments ); } // Kill segments toolbox.getTaskActionClient().submit(new SegmentNukeAction(new HashSet<>(unusedSegments))); for (DataSegment segment : unusedSegments) { toolbox.getDataSegmentKiller().kill(segment); } ``` I know it aims to hide the unused segment as soon as possible to prevent making it `used` while it is being deleted, but I'm afraid that this may create zombie unused segments, if we remove a segment from metadata without being able to delete the segments data from deep storage. I propose we add another column (yeah!) `killed` (or something else) so we can: - `killed` segments should be hidden from the other components (the coordinator cannot set them to `used` etc.). Like dead tuples on PG: their data may still be around but they may not be available. 1. List unused segments to kill (can contain segments that are already marked as `killed`) 2. Mark the segments as `killed` 3. Delete data of all `killed` segments 4. Remove metadata of these segments. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
