[PR] [spark] Make saveAsTable+overwrite behave as INSERT OVERWRITE [paimon]

via GitHub Fri, 12 Jun 2026 21:34:37 -0700


Zouxxyy opened a new pull request, #8225:
URL: https://github.com/apache/paimon/pull/8225


   ### Purpose
   
   Previously, `df.write.mode("overwrite").saveAsTable("t")` produced a
   `ReplaceTableAsSelect` plan, which could drop + recreate the table when the 
user
   did not re-specify `partitionBy()` and primary-key options — silently losing 
the
   partition spec, primary keys, and table properties.
   
   This PR makes `saveAsTable` + `overwrite` on an existing table (Spark 3.4+) 
be
   rewritten to `OverwriteByExpression` (or `OverwritePartitionsDynamic` when
   `partitionOverwriteMode=dynamic`), preserving the existing table definition. 
This
   aligns with the behavior of `INSERT OVERWRITE` and is consistent with Delta 
Lake.
   
   SQL `CREATE OR REPLACE TABLE AS SELECT` and V2 `writeTo().replace()` are not 
affected.
   
   ### Tests
   
   Added cases in `DataFrameWriteTestBase`:
   - `saveAsTable overwrite preserves table definition and snapshots`
   - `saveAsTable overwrite on non-partitioned table`
   - `saveAsTable overwrite creates table when not exists`
   - `saveAsTable overwrite respects dynamic partition overwrite mode`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Make saveAsTable+overwrite behave as INSERT OVERWRITE [paimon]

Reply via email to