(spark) branch branch-4.x updated: [SPARK-56943][INFRA] Share SBT precompile artifact with JVM build matrix

ruifengz Tue, 19 May 2026 18:16:49 -0700

This is an automated email from the ASF dual-hosted git repository.

zhengruifeng pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.x by this push:
     new 85af75e4940a [SPARK-56943][INFRA] Share SBT precompile artifact with 
JVM build matrix
85af75e4940a is described below

commit 85af75e4940a297c2d871f7d49b48b13dc34fefa
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Wed May 20 09:15:54 2026 +0800

    [SPARK-56943][INFRA] Share SBT precompile artifact with JVM build matrix
    
    ### What changes were proposed in this pull request?
    
    Have each JVM `build` matrix entry consume the shared `precompile` artifact
    instead of rebuilding Spark from scratch. The artifact already exists - the
    `precompile` job runs today for the pyspark matrix (SPARK-56768). This PR
    just wires the JVM matrix to download, extract, and set
    `SKIP_SCALA_BUILD=true`, which short-circuits the two SBT build calls
    (`build_spark_sbt`, `build_spark_assembly_sbt`) that each entry used to run
    before its tests.
    
    Each JVM matrix entry runs three SBT calls today:
    
    1. `build_spark_sbt`: `Test/package`, kinesis-asl-assembly, connect-assembly
    2. `build_spark_assembly_sbt`: `assembly/package`
    3. `run_scala_tests_sbt`: `<module>/test` (per-entry, varies)
    
    Calls (1) and (2) are byte-equivalent across all 9 entries (same 11 
profiles,
    same goals) and are exactly what the `precompile` job already produces. Call
    (3) still runs per entry; with mtime-preserving tar the extracted classes
    look fresh to Zinc, so no recompilation happens.
    
    Workflow changes (`.github/workflows/build_and_test.yml`):
    
    - `precompile.if`: add `build == 'true'` so the artifact is also produced 
for
      JVM-only changes.
    - `build.needs`: add `precompile`; gate with `(!cancelled()) && ...` so the
      matrix still runs if precompile failed or was skipped.
    - Add `Download precompiled artifact` and `Extract precompiled artifact`
      steps with `continue-on-error: true`, gated on the upstream outcomes -
      same fallback pattern the pyspark matrix uses.
    - In the existing "Run tests" block, export `SKIP_SCALA_BUILD=true` only
      when extract succeeded. `dev/run-tests.py` already honors this env var.
    
    If any part of the precompile chain fails, the entry falls back to the
    pre-PR local-build path.
    
    ### Why are the changes needed?
    
    Saves CI compute. Per-shard, against contemporaneous baseline runs:
    
    | Shard | Baseline `Run tests` | This PR `Run tests` | Net (after ~2m 
DL+extract) |
    |---|---:|---:|---:|
    | `core, unsafe, ...`              | ~75m | ~65m | ~10m saved |
    | `api, catalyst, hive-thriftserver` | ~59m | ~45m | ~14m saved |
    | `hive - slow tests`              | ~52m | ~43m | ~9m saved  |
    | `sql - slow tests`               | ~101m| ~94m | ~7m saved  |
    
    Aggregated across the 9 JVM matrix entries, roughly **~60m of CI compute
    saved per workflow run** (~45m net if `precompile` wasn't already running
    for the pyspark matrix).
    
    Per-test execution times are unchanged (verified against `hive - slow tests`
    JUnit reports: 2220 tests, totals within 1% across runs - the saving comes
    entirely from skipping the SBT build phase, not from tests running faster).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. CI infrastructure only.
    
    ### How was this patch tested?
    
    CI on this PR runs the new path. Each JVM matrix entry's "Run tests" log
    shows `Reusing precompiled artifact, skipping local SBT build.`, and the
    extract step succeeds before `<module>/test` runs. If the precompile
    artifact is missing or cancelled, the entry falls back to the original
    local SBT build path.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Opus 4.7)
    
    Closes #55762 from zhengruifeng/share-precompile-build-matrix.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
    (cherry picked from commit e5715bdfe17ed70cde0a0bcbb6c5a4ee8f7d8c9d)
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .github/workflows/build_and_test.yml | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 6c5929ad6ae6..531eddfc4d31 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -263,8 +263,8 @@ jobs:
   # Build: build Spark and run the tests for specified modules.
   build:
     name: "Build modules: ${{ matrix.modules }} ${{ matrix.comment }}"
-    needs: precondition
-    if: fromJson(needs.precondition.outputs.required).build == 'true'
+    needs: [precondition, precompile]
+    if: (!cancelled()) && fromJson(needs.precondition.outputs.required).build 
== 'true'
     runs-on: ubuntu-latest
     timeout-minutes: 150
     strategy:
@@ -401,6 +401,20 @@ jobs:
       run: |
         python3.12 -m pip install 'numpy>=1.23.2' pyarrow 'pandas==2.3.3' 
pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0' 
'grpcio-status==1.76.0' 'protobuf==6.33.5' 'zstandard==0.25.0'
         python3.12 -m pip list
+    - name: Download precompiled artifact
+      id: download-precompiled
+      if: needs.precompile.result == 'success'
+      continue-on-error: true
+      uses: actions/download-artifact@v6
+      with:
+        name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+    - name: Extract precompiled artifact
+      id: extract-precompiled
+      if: steps.download-precompiled.outcome == 'success'
+      continue-on-error: true
+      run: |
+        tar -xzf compile-artifact.tar.gz
+        rm compile-artifact.tar.gz
     # Run the tests.
     - name: Run tests
       env: ${{ fromJSON(inputs.envs) }}
@@ -411,9 +425,13 @@ jobs:
         # Hive "other tests" test needs larger metaspace size based on 
experiment.
         if [[ "$MODULES_TO_TEST" == "hive" ]] && [[ "$EXCLUDED_TAGS" == 
"org.apache.spark.tags.SlowHiveTest" ]]; then export METASPACE_SIZE=2g; fi
         # SPARK-46283: should delete the following env replacement after SPARK 
3.x EOL
-        if [[ "$MODULES_TO_TEST" == *"streaming-kinesis-asl"* ]] && [[ "${{ 
inputs.branch }}" =~ ^branch-3 ]]; then 
+        if [[ "$MODULES_TO_TEST" == *"streaming-kinesis-asl"* ]] && [[ "${{ 
inputs.branch }}" =~ ^branch-3 ]]; then
           MODULES_TO_TEST=${MODULES_TO_TEST//streaming-kinesis-asl, /}
         fi
+        if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then
+          export SKIP_SCALA_BUILD=true
+          echo "Reusing precompiled artifact, skipping local SBT build."
+        fi
         export SERIAL_SBT_TESTS=1
         ./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST" 
--included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS"
     - name: Upload test results to report
@@ -543,6 +561,7 @@ jobs:
     needs: precondition
     if: >-
       (!cancelled()) && (
+        fromJson(needs.precondition.outputs.required).build == 'true' ||
         fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
         fromJson(needs.precondition.outputs.required).pyspark-pandas == 'true' 
||
         fromJson(needs.precondition.outputs.required).pyspark-install == 
'true' ||


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.x updated: [SPARK-56943][INFRA] Share SBT precompile artifact with JVM build matrix

Reply via email to