This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new b96b63350c3c [SPARK-57069][INFRA] Share SBT precompile artifact with
docker/k8s integration test CI jobs
b96b63350c3c is described below
commit b96b63350c3c153b10452108dbd892069f7be0f4
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu May 28 16:14:46 2026 +0800
[SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s
integration test CI jobs
### What changes were proposed in this pull request?
This PR extends the SBT precompile-sharing pattern (parent:
[SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830); prior
sub-tasks: [SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768)
pyspark, [SPARK-56831](https://issues.apache.org/jira/browse/SPARK-56831)
sparkr, [SPARK-56943](https://issues.apache.org/jira/browse/SPARK-56943) JVM
build) to the two remaining SBT-compiling jobs in
`.github/workflows/build_and_test.yml` that still run their own full Spark
compile:
- `docker-integration-tests`
- `k8s-integration-tests`
Concretely:
- The existing `precompile` job's `if:` gate is extended to also fire when
`docker-integration-tests == 'true'` or `k8s-integration-tests == 'true'` in
the precondition output, so the artifact is available whenever either job needs
it.
- The precompile SBT invocation adds `-Pkubernetes-integration-tests`, so
the integration-tests submodule's `target/` ends up in the shared artifact and
the k8s job doesn't have to recompile it.
- `docker-integration-tests`:
- `needs: precondition` -> `needs: [precondition, precompile]`
- `if:` extended with `(!cancelled()) &&` so the job still runs if
precompile is cancelled.
- Adds "Download precompiled artifact" + "Extract precompiled artifact"
steps between Java setup and `Run tests`, with graceful fallback
(`continue-on-error: true`).
- `Run tests` exports `SKIP_SCALA_BUILD=true` when extraction succeeded;
`dev/run-tests.py` already honors this flag and skips `build_apache_spark` +
`build_spark_assembly_sbt`.
- `k8s-integration-tests`:
- Same `needs:` and `if:` change.
- Adds the same Download/Extract steps after Java setup.
- The actual test runs via a direct `build/sbt ...
"kubernetes-integration-tests/test"` call rather than `dev/run-tests.py`, so no
`SKIP_SCALA_BUILD` is set. SBT sees the extracted `target/` and skips
compilation of the pre-built modules (Spark Core, SQL, kubernetes,
integration-tests, ...); only the small SparkR Scala bindings still compile
(the precompile doesn't include `-Psparkr` because that profile activates
`core/buildRPackage`, which shells out to R, and the precompile runne [...]
### Optional: graceful fallback if precompile fails
Same pattern as the prior sub-tasks:
- `precompile` keeps `continue-on-error: true`.
- Both consumers' "Download precompiled artifact" step is gated on
`needs.precompile.result == 'success'` and has `continue-on-error: true`.
- "Extract precompiled artifact" is gated on the download succeeding and
has `continue-on-error: true`.
- For docker, `SKIP_SCALA_BUILD=true` is exported only when
`steps.extract-precompiled.outcome == 'success'`; otherwise `dev/run-tests.py`
runs the original local SBT build.
- For k8s, if extraction fails, SBT compiles from scratch as before.
Worst case is degraded to the pre-PR behavior, not a workflow failure.
### Profile coverage
The precompile job runs:
```
./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
-Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
-Pdocker-integration-tests -Pkubernetes-integration-tests -Pvolcano \
Test/package streaming-kinesis-asl-assembly/assembly connect/assembly
assembly/package
```
- `docker-integration-tests`: profile is in the precompile invocation; the
module's `target/` is pre-built, so `dev/run-tests --modules
docker-integration-tests` only runs the test phase.
- `k8s-integration-tests`: `-Pkubernetes` and
`-Pkubernetes-integration-tests` are both in the precompile, so the
integration-tests submodule is pre-built. The job's direct SBT call adds
`-Psparkr`, which triggers compile of the small SparkR Scala bindings on top of
the reused `target/`. Net work in this job drops from "compile all of Spark +
integration tests + sparkr" to "compile only the sparkr module".
### Why are the changes needed?
Today every scheduled / dispatched run of `build_and_test.yml` that
requires `docker-integration-tests` or `k8s-integration-tests` re-runs the same
SBT compile that `precompile` already produced for `pyspark` / `sparkr` /
`build`. Wiring these two consumers to the existing artifact removes that
duplicate work for free (precompile is already running).
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure change only.
### How was this patch tested?
The change is exercised by the CI run of this PR itself. The
Download/Extract steps log artifact size; the Run tests step prints `Reusing
precompiled artifact, skipping local SBT build.` for the docker job when the
fast path is taken. If the precompile job is forced to fail (or its artifact is
missing), both consumers fall back to the original local SBT build.
Measured CI timings before vs after are posted as a comment on this PR.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
Closes #56110 from zhengruifeng/share-precompile-integration-tests-dev5.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.github/workflows/build_and_test.yml | 46 +++++++++++++++++++++++++++++++-----
1 file changed, 40 insertions(+), 6 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 6c2606f62683..3d5e94ef275f 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -573,7 +573,9 @@ jobs:
fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
fromJson(needs.precondition.outputs.required).pyspark-pandas == 'true'
||
fromJson(needs.precondition.outputs.required).pyspark-install ==
'true' ||
- fromJson(needs.precondition.outputs.required).sparkr == 'true')
+ fromJson(needs.precondition.outputs.required).sparkr == 'true' ||
+ fromJson(needs.precondition.outputs.required).docker-integration-tests
== 'true' ||
+ fromJson(needs.precondition.outputs.required).k8s-integration-tests ==
'true')
name: "Precompile Spark"
runs-on: ubuntu-latest
timeout-minutes: 60
@@ -624,7 +626,7 @@ jobs:
run: |
./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud
-Phive \
-Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
- -Pdocker-integration-tests -Pvolcano \
+ -Pdocker-integration-tests -Pkubernetes-integration-tests -Pvolcano \
Test/package streaming-kinesis-asl-assembly/assembly
connect/assembly assembly/package
- name: Package compile output
run: |
@@ -1510,8 +1512,8 @@ jobs:
path: "**/target/unit-tests.log"
docker-integration-tests:
- needs: precondition
- if: fromJson(needs.precondition.outputs.required).docker-integration-tests
== 'true'
+ needs: [precondition, precompile]
+ if: (!cancelled()) &&
fromJson(needs.precondition.outputs.required).docker-integration-tests == 'true'
name: Run Docker integration tests
runs-on: ubuntu-latest
timeout-minutes: 120
@@ -1559,9 +1561,27 @@ jobs:
with:
distribution: zulu
java-version: ${{ inputs.java }}
+ - name: Download precompiled artifact
+ id: download-precompiled
+ if: needs.precompile.result == 'success'
+ continue-on-error: true
+ uses: actions/download-artifact@v6
+ with:
+ name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+ - name: Extract precompiled artifact
+ id: extract-precompiled
+ if: steps.download-precompiled.outcome == 'success'
+ continue-on-error: true
+ run: |
+ tar -xzf compile-artifact.tar.gz
+ rm compile-artifact.tar.gz
- name: Run tests
env: ${{ fromJSON(inputs.envs) }}
run: |
+ if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then
+ export SKIP_SCALA_BUILD=true
+ echo "Reusing precompiled artifact, skipping local SBT build."
+ fi
./dev/run-tests --parallelism 1 --modules docker-integration-tests
--included-tags org.apache.spark.tags.DockerTest
- name: Upload test results to report
if: always()
@@ -1586,8 +1606,8 @@ jobs:
path: "**/target/unit-tests.log"
k8s-integration-tests:
- needs: precondition
- if: fromJson(needs.precondition.outputs.required).k8s-integration-tests ==
'true'
+ needs: [precondition, precompile]
+ if: (!cancelled()) &&
fromJson(needs.precondition.outputs.required).k8s-integration-tests == 'true'
name: Run Spark on Kubernetes Integration test
runs-on: ubuntu-latest
timeout-minutes: 120
@@ -1632,6 +1652,20 @@ jobs:
with:
distribution: zulu
java-version: ${{ inputs.java }}
+ - name: Download precompiled artifact
+ id: download-precompiled
+ if: needs.precompile.result == 'success'
+ continue-on-error: true
+ uses: actions/download-artifact@v6
+ with:
+ name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+ - name: Extract precompiled artifact
+ id: extract-precompiled
+ if: steps.download-precompiled.outcome == 'success'
+ continue-on-error: true
+ run: |
+ tar -xzf compile-artifact.tar.gz
+ rm compile-artifact.tar.gz
- name: Install R
run: |
sudo apt update
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]