andygrove opened a new pull request, #4355: URL: https://github.com/apache/datafusion-comet/pull/4355
## Which issue does this PR close? Closes #4352. ## Rationale for this change Comet's `native_datafusion` scan rejects Parquet-to-Spark conversions that Spark's vectorized reader rejects, but Spark's parquet-mr (non-vectorized) path silently overflows / nulls. Disabling `spark.sql.parquet.enableVectorizedReader` opts into parquet-mr semantics that Comet has no equivalent for, so by default Comet should fall back to Spark in that case. Users who want Comet to handle the scan regardless can opt in. ## What changes are included in this PR? - New config `spark.comet.scan.allowDisabledParquetVectorizedReader` (default `false` → fall back to Spark when vectorized reader is disabled). - `CometScanRule.nativeDataFusionScan` skips itself when the vectorized reader is disabled and the opt-in flag is false. - `CometTestBase` sets the flag to `true` so existing Comet tests continue to exercise the native scan. - Re-enables (un-ignores) the affected `ParquetTypeWideningSuite` tests in the 4.0.2 and 4.1.1 diffs. This PR is stacked on the in-progress `native-df-type-promotion-validation` branch, so the diff includes that surrounding work; the 4352-specific changes are the last two commits on the branch. ## How are these changes tested? Existing test suites — the previously ignored `ParquetTypeWideningSuite` tests are now exercised on Spark 4.0 and 4.1 via the parquet-mr fallback path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
