[PR] [spark] Fix schema merge creating duplicate columns with case-mismatched names [paimon]

via GitHub Fri, 29 May 2026 01:15:48 -0700


Zouxxyy opened a new pull request, #8034:
URL: https://github.com/apache/paimon/pull/8034


   ### Purpose
   
   When `merge-schema` is enabled and source column names differ only in case 
from target columns (e.g. source `ID` vs target `id`), `SchemaMergingUtils` 
treats them as new columns due to case-sensitive `HashMap` lookups. This causes 
duplicate columns in the schema and makes the table unreadable (`Field names 
must be unique`).
   
   This PR adds a `caseSensitive` parameter through the schema merge chain 
(`SchemaMergingUtils` → `SchemaManager` → `FileStore` → Spark `SchemaHelper`), 
using `TreeMap(String.CASE_INSENSITIVE_ORDER)` for field matching when 
`caseSensitive=false`. Spark callers pass `spark.sql.caseSensitive` config 
(default `false`).
   
   Affects both `INSERT ... merge-schema=true` and `MERGE INTO ... 
merge-schema=true` paths.
   
   ### Tests
   
   Added 13 case-sensitivity tests in `WriteMergeSchemaTest` covering:
   - INSERT and MERGE INTO with case-mismatched column names
   - Nested struct fields with case mismatch
   - Schema unchanged when only case differs (no new columns)
   - Repeated writes with alternating case
   - Mixed case-mismatch with genuinely new columns
   - Case-sensitive mode correctly treats different case as new columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [spark] Fix schema merge creating duplicate columns with case-mismatched names [paimon]

Reply via email to