andygrove opened a new issue, #3760: URL: https://github.com/apache/datafusion-comet/issues/3760
## Description When running Spark SQL tests with `native_datafusion` scan, tests that expect errors for duplicate or ambiguous fields in case-insensitive mode fail because DataFusion's Parquet reader doesn't enforce Spark's case-sensitivity validation rules. ## Affected Tests ### `Spark native readers should respect spark.sql.caseSensitive` (FileBasedDataSourceSuite) Writes a Parquet file with columns `A`, `b`, `B`, then reads with `caseSensitive=false`. Spark expects a `SparkException` when selecting `b` (ambiguous between `b` and `B`), but `native_datafusion` reads without error. ### `SPARK-25207: exception when duplicate fields in case-insensitive mode` (ParquetFilterSuite, V1 and V2) Writes Parquet with columns `A`, `B`, `b`, then reads with `caseSensitive=false`. Spark expects a `SparkException` with cause `RuntimeException` containing `Found duplicate field(s) "B": [B, b]`. The native reader either doesn't detect the duplicate, or wraps the error with a different exception type/cause than expected. ## Context PR #3687 added a fallback from `native_datafusion` for duplicate fields in case-insensitive mode, avoiding the test failures by falling back to the Spark reader. These tests remain ignored because the native reader itself doesn't implement the validation. ## Related - #3687: fall back from native_datafusion for duplicate fields in case-insensitive mode - Split from #3311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
