[I] native_datafusion: case-insensitive mode doesn't detect duplicate/ambiguous Parquet fields [datafusion-comet]

via GitHub Sun, 22 Mar 2026 10:40:04 -0700


andygrove opened a new issue, #3760:
URL: https://github.com/apache/datafusion-comet/issues/3760


   ## Description
   
   When running Spark SQL tests with `native_datafusion` scan, tests that 
expect errors for duplicate or ambiguous fields in case-insensitive mode fail 
because DataFusion's Parquet reader doesn't enforce Spark's case-sensitivity 
validation rules.
   
   ## Affected Tests
   
   ### `Spark native readers should respect spark.sql.caseSensitive` 
(FileBasedDataSourceSuite)
   
   Writes a Parquet file with columns `A`, `b`, `B`, then reads with 
`caseSensitive=false`. Spark expects a `SparkException` when selecting `b` 
(ambiguous between `b` and `B`), but `native_datafusion` reads without error.
   
   ### `SPARK-25207: exception when duplicate fields in case-insensitive mode` 
(ParquetFilterSuite, V1 and V2)
   
   Writes Parquet with columns `A`, `B`, `b`, then reads with 
`caseSensitive=false`. Spark expects a `SparkException` with cause 
`RuntimeException` containing `Found duplicate field(s) "B": [B, b]`. The 
native reader either doesn't detect the duplicate, or wraps the error with a 
different exception type/cause than expected.
   
   ## Context
   
   PR #3687 added a fallback from `native_datafusion` for duplicate fields in 
case-insensitive mode, avoiding the test failures by falling back to the Spark 
reader. These tests remain ignored because the native reader itself doesn't 
implement the validation.
   
   ## Related
   
   - #3687: fall back from native_datafusion for duplicate fields in 
case-insensitive mode
   - Split from #3311


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] native_datafusion: case-insensitive mode doesn't detect duplicate/ambiguous Parquet fields [datafusion-comet]

Reply via email to