Re: [PR] Clean up deep storage after index parallel merge finishes (druid)

via GitHub Mon, 23 Mar 2026 01:30:22 -0700


GWphua commented on code in PR #19187:
URL: https://github.com/apache/druid/pull/19187#discussion_r2973554284



##########
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexSupervisorTask.java:
##########
@@ -957,6 +964,62 @@ TaskStatus runRangePartitionMultiPhaseParallel(TaskToolbox 
toolbox) throws Excep
     return taskStatus;
   }
 
+  /**
+   * Cleans up deep storage shuffle data produced during phase 1 of 
multi-phase parallel indexing.
+   * <p>
+   * Cleanup is performed here in the supervisor task rather than in
+   * {@link 
org.apache.druid.indexing.worker.shuffle.DeepStorageIntermediaryDataManager} 
because of
+   * the process model: phase-1 sub-tasks run as separate peon processes that 
exit before phase 2
+   * starts. Each sub-task's DeepStorageIntermediaryDataManager instance is 
destroyed when the peon
+   * exits, so no surviving manager instance has knowledge of what files were 
pushed. The supervisor
+   * task is the only entity that is both alive after phase 2 completes and 
has the complete set of
+   * loadSpecs (collected from all sub-task reports).
+   * <p>
+   * This method constructs minimal {@link DataSegment} objects from {@link 
DeepStoragePartitionStat} loadSpecs and
+   * delegates deletion to the appropriate storage-specific {@link 
DataSegmentKiller}.
+   *
+   * @param killer  the segment killer from {@link 
TaskToolbox#getDataSegmentKiller()}.
+   * @param reports phase-1 sub-task reports containing partition stats with 
loadSpecs,
+   *                may be null or empty if phase 1 produced no output.
+   */
+  @VisibleForTesting
+  static void cleanupDeepStorageShuffleData(

Review Comment:
   I did a test on HDFS. Here's some findings:
   1. The HDFS shuffle-data are stored in somewhere like the following:
   ```
   
segments/shuffle-data/<taskId>/<startInterval>/<endInterval>/<partition>/<PartialIndexGeneratorTaskId>/0_index.zip
   ```
   
   2. Current implementation of `kill` in `HdfsDataSegmentKiller` hard-codes 
removal depth of 2 to 3, meaning we will still see **EMPTY** directories of the 
following after all segments are killed:
   
   ```
   -- Before --
   
segments/shuffle-data/<taskId>/<startInterval>/<endInterval>/<partition>/<PartialIndexGeneratorTaskId>/0_index.zip
   
   -- After -- (Either can happen)
   segments/shuffle-data/<taskId>/<startInterval> (Deleted 3 layers up)
   segments/shuffle-data/<taskId>/<startInterval>/<endInterval> (Deleted 2 
layers up)
   ```
   
   3. I will add a new method in `DataSegmentKiller` interface solely for 
handling intermediary tasks, and implementations will default to the `kill` 
method. Users using other storage forms are welcome to implement their own 
methods.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Clean up deep storage after index parallel merge finishes (druid)

Reply via email to