Zouxxyy opened a new pull request, #8042: URL: https://github.com/apache/paimon/pull/8042
### Purpose Today `write.merge-schema` couples column-addition with *unconditional* type widening. That has two problems: it can attempt unsupported widenings (e.g. `ARRAY<INT>` -> `ARRAY<BIGINT>`) on a plain column-addition write, and the widening behavior is inconsistent between catalog writes and `MERGE INTO`. This PR decouples the two with an explicit, opt-in switch: | Option | Default | Effect | |--------|---------|--------| | `write.merge-schema` | false | Evolve schema for **new columns only**; existing column types are kept and incoming values are cast to them. | | `write.merge-schema.type-widening` | false | Additionally widen an existing column type to a wider compatible type (e.g. `INT -> BIGINT`, `DECIMAL` precision increase). | | `write.merge-schema.explicit-cast` | false | Additionally allow lossy casts (e.g. `BIGINT -> INT`, `STRING -> DATE`). | Core: the `typeWidening` flag is threaded through `FileStore#mergeSchema` -> `SchemaManager#mergeSchema` -> `SchemaMergingUtils#merge`. Leaf types are kept unchanged when `typeWidening=false`; complex types (Row/Array/Map/Multiset) still recurse, so nested column additions keep working. Spark: schema-evolution logic is consolidated into `SchemaEvolutionHelper`, and the schema commit for catalog writes (V1/V2) and `MERGE INTO` is deferred to execution — analysis only evolves the schema in memory, keeping the analyzer side-effect-free. Behavior is now consistent across catalog writes and MERGE. Docs (`docs/docs/spark/sql-write.md`) are updated with the three options and a column-alignment-by-write-path table. ### Tests - `SchemaMergingUtilsTest` — typeWidening on/off matrix (core) - `WriteMergeSchemaTest` / `V2WriteMergeSchemaTest` - `DataFrameWriteTest` — schema-evolution groups across pk / bucket / format, plus explicit-cast - `MergeIntoPrimaryKeyBucketedTableTest`, `PaimonSinkTest` (streaming schema evolution) - Full Spark 3.5 suite green -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
