[PR] [spark] Decouple type-widening from merge-schema with an explicit switch [paimon]

via GitHub Sat, 30 May 2026 19:57:53 -0700


Zouxxyy opened a new pull request, #8042:
URL: https://github.com/apache/paimon/pull/8042


   ### Purpose
   
   Today `write.merge-schema` couples column-addition with *unconditional* type 
widening. That has two problems: it can attempt unsupported widenings (e.g. 
`ARRAY<INT>` -> `ARRAY<BIGINT>`) on a plain column-addition write, and the 
widening behavior is inconsistent between catalog writes and `MERGE INTO`.
   
   This PR decouples the two with an explicit, opt-in switch:
   
   | Option | Default | Effect |
   |--------|---------|--------|
   | `write.merge-schema` | false | Evolve schema for **new columns only**; 
existing column types are kept and incoming values are cast to them. |
   | `write.merge-schema.type-widening` | false | Additionally widen an 
existing column type to a wider compatible type (e.g. `INT -> BIGINT`, 
`DECIMAL` precision increase). |
   | `write.merge-schema.explicit-cast` | false | Additionally allow lossy 
casts (e.g. `BIGINT -> INT`, `STRING -> DATE`). |
   
   Core: the `typeWidening` flag is threaded through `FileStore#mergeSchema` -> 
`SchemaManager#mergeSchema` -> `SchemaMergingUtils#merge`. Leaf types are kept 
unchanged when `typeWidening=false`; complex types (Row/Array/Map/Multiset) 
still recurse, so nested column additions keep working.
   
   Spark: schema-evolution logic is consolidated into `SchemaEvolutionHelper`, 
and the schema commit for catalog writes (V1/V2) and `MERGE INTO` is deferred 
to execution — analysis only evolves the schema in memory, keeping the analyzer 
side-effect-free. Behavior is now consistent across catalog writes and MERGE.
   
   Docs (`docs/docs/spark/sql-write.md`) are updated with the three options and 
a column-alignment-by-write-path table.
   
   ### Tests
   
   - `SchemaMergingUtilsTest` — typeWidening on/off matrix (core)
   - `WriteMergeSchemaTest` / `V2WriteMergeSchemaTest`
   - `DataFrameWriteTest` — schema-evolution groups across pk / bucket / 
format, plus explicit-cast
   - `MergeIntoPrimaryKeyBucketedTableTest`, `PaimonSinkTest` (streaming schema 
evolution)
   - Full Spark 3.5 suite green


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Decouple type-widening from merge-schema with an explicit switch [paimon]

Reply via email to