andygrove opened a new pull request, #4258:
URL: https://github.com/apache/datafusion-comet/pull/4258
## Which issue does this PR close?
Closes #.
## Rationale for this change
The microbenchmarks added a "Comet (Scan)" case (`COMET_ENABLED=true`,
`COMET_EXEC_ENABLED=false`) intending to isolate scan performance from operator
performance. With `spark.comet.scan.impl=auto` (the default),
`CometScanRule.nativeDataFusionScan` refuses to install when exec is disabled,
so the case actually measures `native_iceberg_compat` scan + Spark
`ColumnarToRow`. Comparing it against the other Comet case (which uses
`native_datafusion` + `CometNativeColumnarToRow`) makes the result a proxy for
scan-impl choice rather than the intended scan-vs-scan+exec isolation. The
numbers are confusing rather than informative: the rlike microbenchmark is a
clear example, where the Project falls back in both Comet cases yet "Comet
(Scan + Exec)" still shows ~3x over "Comet (Scan)" purely because of the
upstream scan/c2r difference.
## What changes are included in this PR?
- `CometBenchmarkBase.runExpressionBenchmark`: drop the `Comet (Scan)` case,
rename `Comet (Scan + Exec)` to `Comet`, update the scaladoc.
- `CometExecBenchmark`: drop five `SQL Parquet - Comet (Scan)` cases, rename
`SQL Parquet - Comet (Scan, Exec)` to `SQL Parquet - Comet` (including the
BloomFilterAgg variant). The `SQL Parquet - Spark (Scan), Comet (Exec)` case in
the Project+Filter benchmark stays: it forces a different scan source and is a
meaningfully different config.
- Doc-comment fixes in `CometStringExpressionBenchmark`,
`CometCsvExpressionBenchmark`, `CometJsonExpressionBenchmark` ("scan+exec case"
-> "Comet case").
The plan-not-fully-Comet warning logic in `runExpressionBenchmark` is
retained: it still surfaces fallbacks (rlike, regexp_replace, etc.) and is more
useful with the simpler case list.
## How are these changes tested?
These are benchmarks; the change is renaming/removing benchmark cases.
Verified with `./mvnw test-compile` and `./mvnw scalastyle:check`. No behavior
change to production code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]