[PR] [spark] Harden dynamic overwrite against optimized child plans [paimon]

via GitHub Sun, 31 May 2026 05:10:51 -0700


kerwin-zk opened a new pull request, #8052:
URL: https://github.com/apache/paimon/pull/8052


   ### Purpose
   `PaimonDynamicPartitionOverwriteCommand` exposes its child query to Spark 
optimizer through `V2WriteCommand`, but later wraps the same query back into a 
Dataset in `run()` before passing it to `WriteIntoPaimonTable`.This is fragile 
when the child query has already been optimized by Spark. The optimized plan 
may contain optimizer/planner-side placeholders, such as 
`DynamicPruningSubquery`, which are not ideal to expose again to writer-side 
Dataset operations.
   
   This PR makes the command-to-writer boundary more robust for the dynamic 
partition overwrite fallback path. Before passing the query to 
`WriteIntoPaimonTable`, it converts the child query into an RDD-backed 
DataFrame via `createNewDataFrame(createDataset(...))`. As a result, the writer 
consumes a clean logical plan instead of directly consuming the possibly 
optimized child plan.
   
   ### Tests
   CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Harden dynamic overwrite against optimized child plans [paimon]

Reply via email to