github-actions[bot] commented on code in PR #61223:
URL: https://github.com/apache/doris/pull/61223#discussion_r2937652231


##########
fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java:
##########
@@ -1544,6 +1548,19 @@ private void outfileWriteSuccess(OutFileClause 
outFileClause) throws Exception {
         }
     }
 
+    private void deleteExistingOutfileFilesInFe(OutFileClause outFileClause) 
throws UserException {
+        // Handle directory cleanup once in FE so parallel outfile writers 
never race on deletion.
+        if (!outFileClause.shouldDeleteExistingFiles()) {
+            return;
+        }
+        if (outFileClause.getBrokerDesc() == null
+                || outFileClause.getBrokerDesc().storageType() == 
StorageType.LOCAL) {
+            return;
+        }
+        
BrokerUtil.deleteParentDirectoryWithFileSystem(outFileClause.getFilePath(), 
outFileClause.getBrokerDesc());
+        outFileClause.markDeleteExistingFilesHandledInFe();
+    }

Review Comment:
   **[Dead Code]** `markDeleteExistingFilesHandledInFe()` sets 
`deleteExistingFiles = false` on this ephemeral `OutFileClause` instance, but:
   1. This instance is never used again (it was created fresh by 
`getOutFileClause()` and passed as a parameter).
   2. The `TResultFileSinkOptions` that carries `deleteExistingFiles` to BE was 
already created in `PhysicalPlanTranslator.visitPhysicalFileSink()` during plan 
translation, from a different `OutFileClause` instance.
   
   This call has no observable effect.



##########
fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java:
##########
@@ -1368,6 +1369,9 @@ public void executeAndSendResult(boolean isOutfileQuery, 
boolean isSendFields,
         coordBase.setIsProfileSafeStmt(this.isProfileSafeStmt());
 
         try {
+            if (isOutfileQuery) {
+                deleteExistingOutfileFilesInFe(queryStmt.getOutFileClause());
+            }

Review Comment:
   **[Critical Bug]** `queryStmt.getOutFileClause()` returns a **new 
`OutFileClause` object on every call** (see 
`LogicalPlanAdapter.getOutFileClause()` which constructs `new 
OutFileClause(...)` each time). This means 
`markDeleteExistingFilesHandledInFe()` inside 
`deleteExistingOutfileFilesInFe()` mutates a throwaway object — the flag is 
never cleared on the `OutFileClause` (or more critically, the 
`TResultFileSinkOptions`) that is actually sent to BE.
   
   Moreover, the `ResultFileSink` containing `TResultFileSinkOptions` with 
`deleteExistingFiles=true` is already baked into the plan fragments during 
`planner.plan()` (in `PhysicalPlanTranslator.visitPhysicalFileSink()`), which 
runs well before this line. Even reaching the "right" `OutFileClause` wouldn't 
help — the Thrift options struct was already copied.
   
   **Result**: FE deletes the directory, but each parallel BE writer still 
receives `deleteExistingFiles=true` and will race on deletion — the original 
bug is NOT fixed.
   
   **Suggested fix**: Either (a) clear `deleteExistingFiles` on the 
`ResultFileSink.fileSinkOptions` in the plan fragment after the FE deletion, or 
(b) move the deletion + flag clearing to **before** `planner.plan()` by 
modifying `LogicalFileSink.properties` to remove the `delete_existing_files` 
key after performing the FE-side deletion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to