Zouxxyy opened a new pull request, #8225:
URL: https://github.com/apache/paimon/pull/8225
### Purpose
Previously, `df.write.mode("overwrite").saveAsTable("t")` produced a
`ReplaceTableAsSelect` plan, which could drop + recreate the table when the
user
did not re-specify `partitionBy()` and primary-key options — silently losing
the
partition spec, primary keys, and table properties.
This PR makes `saveAsTable` + `overwrite` on an existing table (Spark 3.4+)
be
rewritten to `OverwriteByExpression` (or `OverwritePartitionsDynamic` when
`partitionOverwriteMode=dynamic`), preserving the existing table definition.
This
aligns with the behavior of `INSERT OVERWRITE` and is consistent with Delta
Lake.
SQL `CREATE OR REPLACE TABLE AS SELECT` and V2 `writeTo().replace()` are not
affected.
### Tests
Added cases in `DataFrameWriteTestBase`:
- `saveAsTable overwrite preserves table definition and snapshots`
- `saveAsTable overwrite on non-partitioned table`
- `saveAsTable overwrite creates table when not exists`
- `saveAsTable overwrite respects dynamic partition overwrite mode`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]