andygrove opened a new pull request, #4363:
URL: https://github.com/apache/datafusion-comet/pull/4363

   ## Which issue does this PR close?
   
   Part of #4020.
   
   ## Rationale for this change
   
   With #4019 restricting the Comet Parquet scan to `native_datafusion`, the 
`spark.comet.scan.impl` config and the three scan-impl name constants are 
deprecated stubs that no longer affect behavior. Removing them simplifies the 
scan code path (no more impl conditional in `CometExecRule`, no impl parameter 
on `CometScanExec` / `CometScanTypeChecker`) and reduces the public Comet 
config surface area.
   
   **This PR is stacked on #4019** (branched off `remove-iceberg-compat`). Held 
as a draft until #4019 is merged, after which this branch will be rebased onto 
`main`. The diff currently shows both #4019's changes and this PR's; once #4019 
lands, only this PR's changes will remain.
   
   ## What changes are included in this PR?
   
   **`CometConf`:** drop `COMET_NATIVE_SCAN_IMPL`, `SCAN_NATIVE_DATAFUSION`, 
`SCAN_NATIVE_ICEBERG_COMPAT`, `SCAN_AUTO`.
   
   **Main code:**
   - `CometScanExec`: remove `scanImpl: String` field & companion `apply` 
parameter; drop the `assert(scanImpl != CometConf.SCAN_AUTO)`; simplify 
`nodeName` (no `[native_datafusion]` suffix); collapse `supportedDataFilters` 
(the impl conditional is now always-true).
   - `CometScanRule`: rephrase user-facing fallback messages from 
\`\$SCAN_NATIVE_DATAFUSION scan\` to neutral \"Native Parquet scan\"; drop 
`scanImpl` parameter from `isSchemaSupported` and `CometScanTypeChecker`.
   - `CometExecRule`: collapse `case scan: CometScanExec if scan.scanImpl == 
SCAN_NATIVE_DATAFUSION` to `case scan: CometScanExec`.
   
   **Tests:** remove every `withSQLConf(COMET_NATIVE_SCAN_IMPL.key -> ...)` 
call across 22 test files. Drop the dead pre-Spark-4 `TimestampLTZ as 
TimestampNTZ` test in `ParquetTimestampLtzAsNtzSuite` (its `assume` was always 
skipping post-#4019). Simplify `ParquetEncryptionITCase` (no longer needs the 
Comet/Spark branch split that toggled the scan impl).
   
   **Benchmarks:** drop the `COMET_NATIVE_SCAN_IMPL` map entries and the 
`(\$SCAN_NATIVE_DATAFUSION)` suffix in `CometBenchmarkBase`. Remove 
`spark.comet.scan.impl` lines from `benchmarks/tpc/engines/comet.toml` and 
`comet-hashjoin.toml`.
   
   A separate docs cleanup is in #4362 (also stacked, based on `main`).
   
   ## How are these changes tested?
   
   Spotless + Scala compile clean. Targeted suites run locally on `spark-4.1` 
profile:
   
   - `CometIfSuite`: 1 / 1 pass
   - `CometScanRuleSuite`: 2 / 2 pass
   - `CometNativeReaderSuite`: 50 / 50 pass
   
   Wider CI coverage will run on this PR's GitHub Actions once the base PR 
(#4019) is merged and this is rebased onto `main`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to