[PR] [spark] Reject ALTER TABLE REPLACE COLUMNS to avoid silent data corruption [paimon]

via GitHub Mon, 15 Jun 2026 21:00:24 -0700


huangxiaopingRD opened a new pull request, #8246:
URL: https://github.com/apache/paimon/pull/8246


   ## Summary
   
   Spark translates `ALTER TABLE ... REPLACE COLUMNS` into a batch that drops 
every existing column and re-adds the new set (a combination of `DeleteColumn` 
+ `AddColumn`). For Paimon this is unsafe: re-adding columns assigns brand-new 
field ids while existing data files keep the old ids, so same-named columns are 
treated as new columns and read back as `null` — a silent data corruption.
   
   This PR detects that change pattern in `SparkCatalog.alterTable` and throws 
an `UnsupportedOperationException` with a clear message pointing users to 
`RENAME COLUMN` / `ALTER COLUMN TYPE` / `DROP COLUMN` / `ADD COLUMN` instead.
   
   The detection matches exclusively on `DeleteColumn` + `AddColumn` so a 
legitimate mixed batch (e.g. a programmatic DROP + RENAME) is not mistaken for 
a replace.
   
   ## Tests
   
   Added `SparkSchemaEvolutionITCase#testReplaceColumnsUnsupported` verifying 
the operation is rejected with the expected exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Reject ALTER TABLE REPLACE COLUMNS to avoid silent data corruption [paimon]

Reply via email to