This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 85af75e4940a [SPARK-56943][INFRA] Share SBT precompile artifact with
JVM build matrix
85af75e4940a is described below
commit 85af75e4940a297c2d871f7d49b48b13dc34fefa
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Wed May 20 09:15:54 2026 +0800
[SPARK-56943][INFRA] Share SBT precompile artifact with JVM build matrix
### What changes were proposed in this pull request?
Have each JVM `build` matrix entry consume the shared `precompile` artifact
instead of rebuilding Spark from scratch. The artifact already exists - the
`precompile` job runs today for the pyspark matrix (SPARK-56768). This PR
just wires the JVM matrix to download, extract, and set
`SKIP_SCALA_BUILD=true`, which short-circuits the two SBT build calls
(`build_spark_sbt`, `build_spark_assembly_sbt`) that each entry used to run
before its tests.
Each JVM matrix entry runs three SBT calls today:
1. `build_spark_sbt`: `Test/package`, kinesis-asl-assembly, connect-assembly
2. `build_spark_assembly_sbt`: `assembly/package`
3. `run_scala_tests_sbt`: `<module>/test` (per-entry, varies)
Calls (1) and (2) are byte-equivalent across all 9 entries (same 11
profiles,
same goals) and are exactly what the `precompile` job already produces. Call
(3) still runs per entry; with mtime-preserving tar the extracted classes
look fresh to Zinc, so no recompilation happens.
Workflow changes (`.github/workflows/build_and_test.yml`):
- `precompile.if`: add `build == 'true'` so the artifact is also produced
for
JVM-only changes.
- `build.needs`: add `precompile`; gate with `(!cancelled()) && ...` so the
matrix still runs if precompile failed or was skipped.
- Add `Download precompiled artifact` and `Extract precompiled artifact`
steps with `continue-on-error: true`, gated on the upstream outcomes -
same fallback pattern the pyspark matrix uses.
- In the existing "Run tests" block, export `SKIP_SCALA_BUILD=true` only
when extract succeeded. `dev/run-tests.py` already honors this env var.
If any part of the precompile chain fails, the entry falls back to the
pre-PR local-build path.
### Why are the changes needed?
Saves CI compute. Per-shard, against contemporaneous baseline runs:
| Shard | Baseline `Run tests` | This PR `Run tests` | Net (after ~2m
DL+extract) |
|---|---:|---:|---:|
| `core, unsafe, ...` | ~75m | ~65m | ~10m saved |
| `api, catalyst, hive-thriftserver` | ~59m | ~45m | ~14m saved |
| `hive - slow tests` | ~52m | ~43m | ~9m saved |
| `sql - slow tests` | ~101m| ~94m | ~7m saved |
Aggregated across the 9 JVM matrix entries, roughly **~60m of CI compute
saved per workflow run** (~45m net if `precompile` wasn't already running
for the pyspark matrix).
Per-test execution times are unchanged (verified against `hive - slow tests`
JUnit reports: 2220 tests, totals within 1% across runs - the saving comes
entirely from skipping the SBT build phase, not from tests running faster).
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure only.
### How was this patch tested?
CI on this PR runs the new path. Each JVM matrix entry's "Run tests" log
shows `Reusing precompiled artifact, skipping local SBT build.`, and the
extract step succeeds before `<module>/test` runs. If the precompile
artifact is missing or cancelled, the entry falls back to the original
local SBT build path.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
Closes #55762 from zhengruifeng/share-precompile-build-matrix.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit e5715bdfe17ed70cde0a0bcbb6c5a4ee8f7d8c9d)
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.github/workflows/build_and_test.yml | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 6c5929ad6ae6..531eddfc4d31 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -263,8 +263,8 @@ jobs:
# Build: build Spark and run the tests for specified modules.
build:
name: "Build modules: ${{ matrix.modules }} ${{ matrix.comment }}"
- needs: precondition
- if: fromJson(needs.precondition.outputs.required).build == 'true'
+ needs: [precondition, precompile]
+ if: (!cancelled()) && fromJson(needs.precondition.outputs.required).build
== 'true'
runs-on: ubuntu-latest
timeout-minutes: 150
strategy:
@@ -401,6 +401,20 @@ jobs:
run: |
python3.12 -m pip install 'numpy>=1.23.2' pyarrow 'pandas==2.3.3'
pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0'
'grpcio-status==1.76.0' 'protobuf==6.33.5' 'zstandard==0.25.0'
python3.12 -m pip list
+ - name: Download precompiled artifact
+ id: download-precompiled
+ if: needs.precompile.result == 'success'
+ continue-on-error: true
+ uses: actions/download-artifact@v6
+ with:
+ name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+ - name: Extract precompiled artifact
+ id: extract-precompiled
+ if: steps.download-precompiled.outcome == 'success'
+ continue-on-error: true
+ run: |
+ tar -xzf compile-artifact.tar.gz
+ rm compile-artifact.tar.gz
# Run the tests.
- name: Run tests
env: ${{ fromJSON(inputs.envs) }}
@@ -411,9 +425,13 @@ jobs:
# Hive "other tests" test needs larger metaspace size based on
experiment.
if [[ "$MODULES_TO_TEST" == "hive" ]] && [[ "$EXCLUDED_TAGS" ==
"org.apache.spark.tags.SlowHiveTest" ]]; then export METASPACE_SIZE=2g; fi
# SPARK-46283: should delete the following env replacement after SPARK
3.x EOL
- if [[ "$MODULES_TO_TEST" == *"streaming-kinesis-asl"* ]] && [[ "${{
inputs.branch }}" =~ ^branch-3 ]]; then
+ if [[ "$MODULES_TO_TEST" == *"streaming-kinesis-asl"* ]] && [[ "${{
inputs.branch }}" =~ ^branch-3 ]]; then
MODULES_TO_TEST=${MODULES_TO_TEST//streaming-kinesis-asl, /}
fi
+ if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then
+ export SKIP_SCALA_BUILD=true
+ echo "Reusing precompiled artifact, skipping local SBT build."
+ fi
export SERIAL_SBT_TESTS=1
./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST"
--included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS"
- name: Upload test results to report
@@ -543,6 +561,7 @@ jobs:
needs: precondition
if: >-
(!cancelled()) && (
+ fromJson(needs.precondition.outputs.required).build == 'true' ||
fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
fromJson(needs.precondition.outputs.required).pyspark-pandas == 'true'
||
fromJson(needs.precondition.outputs.required).pyspark-install ==
'true' ||
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]