github-actions[bot] commented on code in PR #61223:
URL: https://github.com/apache/doris/pull/61223#discussion_r2937652231
##########
fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java:
##########
@@ -1544,6 +1548,19 @@ private void outfileWriteSuccess(OutFileClause
outFileClause) throws Exception {
}
}
+ private void deleteExistingOutfileFilesInFe(OutFileClause outFileClause)
throws UserException {
+ // Handle directory cleanup once in FE so parallel outfile writers
never race on deletion.
+ if (!outFileClause.shouldDeleteExistingFiles()) {
+ return;
+ }
+ if (outFileClause.getBrokerDesc() == null
+ || outFileClause.getBrokerDesc().storageType() ==
StorageType.LOCAL) {
+ return;
+ }
+
BrokerUtil.deleteParentDirectoryWithFileSystem(outFileClause.getFilePath(),
outFileClause.getBrokerDesc());
+ outFileClause.markDeleteExistingFilesHandledInFe();
+ }
Review Comment:
**[Dead Code]** `markDeleteExistingFilesHandledInFe()` sets
`deleteExistingFiles = false` on this ephemeral `OutFileClause` instance, but:
1. This instance is never used again (it was created fresh by
`getOutFileClause()` and passed as a parameter).
2. The `TResultFileSinkOptions` that carries `deleteExistingFiles` to BE was
already created in `PhysicalPlanTranslator.visitPhysicalFileSink()` during plan
translation, from a different `OutFileClause` instance.
This call has no observable effect.
##########
fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java:
##########
@@ -1368,6 +1369,9 @@ public void executeAndSendResult(boolean isOutfileQuery,
boolean isSendFields,
coordBase.setIsProfileSafeStmt(this.isProfileSafeStmt());
try {
+ if (isOutfileQuery) {
+ deleteExistingOutfileFilesInFe(queryStmt.getOutFileClause());
+ }
Review Comment:
**[Critical Bug]** `queryStmt.getOutFileClause()` returns a **new
`OutFileClause` object on every call** (see
`LogicalPlanAdapter.getOutFileClause()` which constructs `new
OutFileClause(...)` each time). This means
`markDeleteExistingFilesHandledInFe()` inside
`deleteExistingOutfileFilesInFe()` mutates a throwaway object — the flag is
never cleared on the `OutFileClause` (or more critically, the
`TResultFileSinkOptions`) that is actually sent to BE.
Moreover, the `ResultFileSink` containing `TResultFileSinkOptions` with
`deleteExistingFiles=true` is already baked into the plan fragments during
`planner.plan()` (in `PhysicalPlanTranslator.visitPhysicalFileSink()`), which
runs well before this line. Even reaching the "right" `OutFileClause` wouldn't
help — the Thrift options struct was already copied.
**Result**: FE deletes the directory, but each parallel BE writer still
receives `deleteExistingFiles=true` and will race on deletion — the original
bug is NOT fixed.
**Suggested fix**: Either (a) clear `deleteExistingFiles` on the
`ResultFileSink.fileSinkOptions` in the plan fragment after the FE deletion, or
(b) move the deletion + flag clearing to **before** `planner.plan()` by
modifying `LogicalFileSink.properties` to remove the `delete_existing_files`
key after performing the FE-side deletion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]