[PR] fix: configurable fallback when parquet vectorized reader is disabled (#4352) [datafusion-comet]

via GitHub Sat, 16 May 2026 17:58:31 -0700


andygrove opened a new pull request, #4355:
URL: https://github.com/apache/datafusion-comet/pull/4355


   ## Which issue does this PR close?
   
   Closes #4352.
   
   ## Rationale for this change
   
   Comet's `native_datafusion` scan rejects Parquet-to-Spark conversions that 
Spark's vectorized reader rejects, but Spark's parquet-mr (non-vectorized) path 
silently overflows / nulls. Disabling 
`spark.sql.parquet.enableVectorizedReader` opts into parquet-mr semantics that 
Comet has no equivalent for, so by default Comet should fall back to Spark in 
that case. Users who want Comet to handle the scan regardless can opt in.
   
   ## What changes are included in this PR?
   
   - New config `spark.comet.scan.allowDisabledParquetVectorizedReader` 
(default `false` → fall back to Spark when vectorized reader is disabled).
   - `CometScanRule.nativeDataFusionScan` skips itself when the vectorized 
reader is disabled and the opt-in flag is false.
   - `CometTestBase` sets the flag to `true` so existing Comet tests continue 
to exercise the native scan.
   - Re-enables (un-ignores) the affected `ParquetTypeWideningSuite` tests in 
the 4.0.2 and 4.1.1 diffs.
   
   This PR is stacked on the in-progress `native-df-type-promotion-validation` 
branch, so the diff includes that surrounding work; the 4352-specific changes 
are the last two commits on the branch.
   
   ## How are these changes tested?
   
   Existing test suites — the previously ignored `ParquetTypeWideningSuite` 
tests are now exercised on Spark 4.0 and 4.1 via the parquet-mr fallback path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: configurable fallback when parquet vectorized reader is disabled (#4352) [datafusion-comet]

Reply via email to