LakshSingla commented on code in PR #14446:
URL: https://github.com/apache/druid/pull/14446#discussion_r1243130993
##########
processing/src/main/java/org/apache/druid/frame/util/DurableStorageUtils.java:
##########
@@ -138,7 +142,7 @@ public static String getOutputsFileNameForPath(
* </ul>
*/
@Nullable
- public static String getControllerTaskIdWithPrefixFromPath(String path)
+ public static String getNextDirNameWithPrefixFromPath(String path)
Review Comment:
Please update the Javadoc for this
##########
processing/src/main/java/org/apache/druid/frame/util/DurableStorageUtils.java:
##########
@@ -150,4 +154,23 @@ public static String
getControllerTaskIdWithPrefixFromPath(String path)
return null;
}
}
+
+ /**
Review Comment:
Can you update the doc with an example of the query results path as well?
##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/DurableStorageCleaner.java:
##########
@@ -88,13 +88,22 @@ public void run() throws Exception
.map(TaskRunnerWorkItem::getTaskId)
.map(DurableStorageUtils::getControllerDirectory)
.collect(Collectors.toSet());
+ Set<String> knownTaskIds = taskRunner.getKnownTasks()
+ .stream()
+ .map(TaskRunnerWorkItem::getTaskId)
+
.map(DurableStorageUtils::getControllerDirectory)
+ .collect(Collectors.toSet());
Set<String> filesToRemove = new HashSet<>();
while (allFiles.hasNext()) {
String currentFile = allFiles.next();
- String taskIdFromPathOrEmpty =
DurableStorageUtils.getControllerTaskIdWithPrefixFromPath(currentFile);
- if (taskIdFromPathOrEmpty != null && !taskIdFromPathOrEmpty.isEmpty()) {
- if (runningTaskIds.contains(taskIdFromPathOrEmpty)) {
+ String nextDirName =
DurableStorageUtils.getNextDirNameWithPrefixFromPath(currentFile);
+ if (nextDirName != null && !nextDirName.isEmpty()) {
+ if (runningTaskIds.contains(nextDirName)) {
+ // do nothing
+ } else if (DurableStorageUtils.QUERY_RESULTS_DIR.equals(nextDirName)
Review Comment:
Can you please confirm if this logic works? Let's say __query_results has a
known id and an unknown id, then we would execute `files.remove(currentFile)`
when we iterate over the unknown id.
However the `currentFile` would be the whole results directory or am I wrong?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]