andygrove opened a new pull request, #4405: URL: https://github.com/apache/datafusion-comet/pull/4405
## Which issue does this PR close? N/A. This adds local developer tooling and has no associated issue. ## Rationale for this change The `spark_sql_test.yml` workflow runs Apache Spark's own SQL test suites with Comet enabled, but there is no convenient way to reproduce that run on a developer machine. Debugging a Spark SQL test failure currently means reconstructing the steps by hand: clone Spark at a version tag, apply the Comet diff, build Comet, and run the right `build/sbt` shard with the right environment. ## What changes are included in this PR? New bash scripts under `dev/ci/spark-sql-tests/` that reproduce the `spark_sql_test.yml` workflow locally for Apache Spark 4.1: - `config.sh`: shared configuration and the seven CI module-shard definitions, copied from `spark_sql_test.yml`. - `setup-spark.sh`: maintains a persistent `apache/spark` checkout and applies `dev/diffs/4.1.1.diff`, preserving Spark's build artifacts across runs. - `run.sh`: builds Comet, runs the selected module shard(s) with `build/sbt` using the same environment as CI, and prints a PASS/FAIL summary. Supports `SKIP_BUILD` and `SKIP_SPARK_SETUP` for fast iteration. - `README.md`: usage, prerequisites, and environment variables. Only Spark 4.1 is supported for now; the scripts are structured so a later change can parameterize the version. ## How are these changes tested? These scripts orchestrate a multi-hour external test run, so they are not exercised end-to-end in CI. They were verified with `bash -n` and `shellcheck -x` (both clean), and with smoke tests of `run.sh` argument handling (`--help`, unknown-module rejection). The module definitions and `build/sbt` arguments match `spark_sql_test.yml` exactly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
