andygrove opened a new pull request, #3576: URL: https://github.com/apache/datafusion-comet/pull/3576
## Summary - Add Docker Compose setup for running TPC-H/TPC-DS benchmarks in an isolated Spark standalone cluster (2 workers) - Bundle TPC-H and TPC-DS query SQL files in the repository, removing the need for external `TPCH_QUERIES`/`TPCDS_QUERIES` env vars - Add `Dockerfile.build-comet` for cross-compiling Comet JARs with Linux native libraries on macOS - Consolidate and improve benchmark runner scripts (`run.py`, `tpcbench.py`) with bundled query support and flexible data path layouts - Update README with Docker setup, platform notes (macOS/Linux), and Iceberg benchmarking docs ## Test plan - [ ] Build Docker image on Linux: `docker build -t comet-bench -f benchmarks/tpc/infra/docker/Dockerfile .` - [ ] Start cluster: `docker compose -f benchmarks/tpc/infra/docker/docker-compose.yml up -d` - [ ] Run TPC-H benchmark: `docker compose run --rm bench python3 /opt/benchmarks/run.py --engine comet --benchmark tpch --no-restart` - [ ] Verify `--dry-run` works: `python3 run.py --engine comet --benchmark tpch --dry-run` - [ ] Verify bundled queries are used (no `TPCH_QUERIES` env var needed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
