andygrove opened a new pull request, #3226:
URL: https://github.com/apache/datafusion-comet/pull/3226
## Summary
- Add `get_spark_configs()` method to base Benchmark class for
benchmark-specific Spark configurations
- Common Comet configs (enabled, logging) now defined in Python for
jvm/native modes
- Add shuffle benchmark variants with and without native parquet writes:
- `shuffle-hash-native-write`: hash shuffle with Comet native parquet
writes enabled
- `shuffle-hash-spark-write`: hash shuffle with native writes disabled
(uses Spark writer)
- `shuffle-roundrobin-native-write`: round-robin shuffle with native
writes enabled
- `shuffle-roundrobin-spark-write`: round-robin shuffle with native writes
disabled
- Add `--print-configs` CLI option to output benchmark-specific configs
- Refactor `run_all_benchmarks.sh` to use helper function and remove
duplicated configs
- Exclude `benchmarks/pyspark/**` from CI test workflows to avoid triggering
tests for benchmark-only changes
## Test plan
- [ ] Run `python run_benchmark.py --list-benchmarks` to verify new
benchmarks are registered
- [ ] Run `python run_benchmark.py --print-configs --benchmark
shuffle-hash-native-write --mode native` to verify config output
- [ ] Run `./run_all_benchmarks.sh` to verify benchmarks execute correctly
🤖 Generated with [Claude Code](https://claude.ai/code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]