machine424 commented on issue #6816:
URL: https://github.com/apache/druid/issues/6816#issuecomment-1004840022


   Hello @jihoonson,
   
   I think the change you suggested `Also, the order of killing segments should 
be changed: the metadata store must be updated first before removing the 
segment file.` was made by you here: 
https://github.com/apache/druid/commit/db149462073d59e7563f0d3834e69d44a2bb4011.
   
   Now, on maqter, the code looks like:
   
   ```java
       final List<DataSegment> unusedSegments = toolbox
           .getTaskActionClient()
           .submit(new RetrieveUnusedSegmentsAction(getDataSource(), 
getInterval()));
   
       if (!TaskLocks.isLockCoversSegments(taskLockMap, unusedSegments)) {
         throw new ISE(
             "Locks[%s] for task[%s] can't cover segments[%s]",
             
taskLockMap.values().stream().flatMap(List::stream).collect(Collectors.toList()),
             getId(),
             unusedSegments
         );
       }
   
       // Kill segments
       toolbox.getTaskActionClient().submit(new SegmentNukeAction(new 
HashSet<>(unusedSegments)));
       for (DataSegment segment : unusedSegments) {
         toolbox.getDataSegmentKiller().kill(segment);
       }
   ```
   
   I know it aims to hide the unused segment as soon as possible to prevent 
making it `used` while it is being deleted, but I'm afraid that this may create 
zombie unused segments, if we remove a segment from metadata without being able 
to delete the segments data from deep storage.
   
   I propose we add another column (yeah!) `killed` (or something else) so we 
can:
   
   - `killed` segments should be hidden from the other components (the 
coordinator cannot set them to `used` etc.). Like dead tuples on PG: their data 
may still be around but they may not be available.
   1. List unused segments to kill (can contain segments that are already 
marked as `killed`)
   2. Mark the segments as `killed`
   3. Delete data of all `killed` segments 
   4. Remove metadata of these segments.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to