This is an automated email from the ASF dual-hosted git repository.

zhengruifeng pushed a commit to branch branch-4.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-4.2 by this push:
     new 8dffe020266e [SPARK-56831][INFRA][R] Share SBT precompile artifact 
with sparkr CI job
8dffe020266e is described below

commit 8dffe020266e073ecd5d3206b5d299005ba7da25
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Tue May 12 13:46:11 2026 +0800

    [SPARK-56831][INFRA][R] Share SBT precompile artifact with sparkr CI job
    
    ### What changes were proposed in this pull request?
    
    Follow-up to 
[SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768) 
(apache/spark#55726), which introduced a shared `precompile` CI job that runs 
Spark's SBT build once and publishes the resulting `target/` trees as a GitHub 
Actions artifact for the pyspark matrix entries to consume. This PR extends 
that same artifact to the `sparkr` build.
    
    Concretely:
    
    - The `precompile` job's `if:` gate now also fires when `sparkr == 'true'` 
is set in the precondition output, so the artifact is built whenever only 
sparkr changes.
    - The `sparkr` job adds `precompile` to `needs:`, downloads and extracts 
the artifact (with the same graceful fallback as the pyspark matrix), and 
exports `SKIP_SCALA_BUILD=true` for `dev/run-tests.py` only when the artifact 
was successfully extracted.
    - No `dev/run-tests.py` change is needed — the `SKIP_SCALA_BUILD` gate 
landed with SPARK-56768.
    
    ### Optional: graceful fallback if precompile fails
    
    Same pattern as the pyspark matrix:
    
    - The "Download precompiled artifact" step is gated on 
`needs.precompile.result == 'success'` and has `continue-on-error: true`.
    - The "Extract precompiled artifact" step is gated on the download 
succeeding and also has `continue-on-error: true`.
    - Inside the "Run tests" bash block, `SKIP_SCALA_BUILD=true` is exported 
only when `steps.extract-precompiled.outcome == 'success'`. Otherwise it stays 
unset and `dev/run-tests.py` falls back to the original local SBT build.
    
    So a precompile/download/extract failure degrades sparkr to the pre-PR 
behavior, not a workflow failure.
    
    ### Why are the changes needed?
    
    The sparkr job today runs the same ~13m of redundant SBT compile that the 
pyspark matrix used to run. Reusing the existing precompile artifact removes 
that redundant work. The `precompile` job is already running in any workflow 
run where pyspark changes are present; adding sparkr as another consumer is 
essentially free (just another download of the same artifact).
    
    When sparkr is the only changed module, the `precompile` job is now 
scheduled to run anyway (via the new `sparkr == 'true'` clause in its `if:` 
gate), so this case picks up the same saving.
    
    ### Estimated savings
    
    | | Per sparkr run |
    |---|---:|
    | Redundant SBT compile in sparkr today | ~13m |
    | Add back: download + extract overhead | ~1m |
    | **Net CI compute saved per sparkr run** | **~12m** |
    
    This is on top of the ~96m / ~14% already saved by SPARK-56768. The actual 
wall clock for the sparkr job will drop by roughly the same amount (sparkr is 
not on the critical path; the pyspark matrix still drives the workflow's 
wall-clock).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. CI infrastructure change only.
    
    ### How was this patch tested?
    
    The change is exercised by the CI run of this PR itself, when the sparkr 
job runs. The expected log signature inside "Run tests" is `Reusing precompiled 
artifact, skipping local SBT build.`, mirroring what the pyspark matrix already 
prints. If the precompile artifact is not available (precompile job failed, or 
this is some future caller that doesn't enable it), sparkr falls back to the 
local SBT build path, which is identical to today's behavior.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Opus 4.7)
    
    Closes #55761 from zhengruifeng/share-precompile-sparkr.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
    (cherry picked from commit ef4e78489f4a2fc2635e96170934ee5791534588)
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .github/workflows/build_and_test.yml | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 99223efbf0dd..3f3400153cde 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -540,7 +540,8 @@ jobs:
       (!cancelled()) && (
         fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
         fromJson(needs.precondition.outputs.required).pyspark-pandas == 'true' 
||
-        fromJson(needs.precondition.outputs.required).pyspark-install == 
'true')
+        fromJson(needs.precondition.outputs.required).pyspark-install == 
'true' ||
+        fromJson(needs.precondition.outputs.required).sparkr == 'true')
     name: "Precompile Spark"
     runs-on: ubuntu-latest
     timeout-minutes: 60
@@ -806,7 +807,7 @@ jobs:
         path: "**/target/unit-tests.log"
 
   sparkr:
-    needs: [precondition, infra-image]
+    needs: [precondition, infra-image, precompile]
     # always run if sparkr == 'true', even infra-image is skip (such as 
non-master job)
     if: (!cancelled()) && fromJson(needs.precondition.outputs.required).sparkr 
== 'true'
     name: "Build modules: sparkr"
@@ -865,6 +866,20 @@ jobs:
       with:
         distribution: zulu
         java-version: ${{ inputs.java }}
+    - name: Download precompiled artifact
+      id: download-precompiled
+      if: needs.precompile.result == 'success'
+      continue-on-error: true
+      uses: actions/download-artifact@v6
+      with:
+        name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+    - name: Extract precompiled artifact
+      id: extract-precompiled
+      if: steps.download-precompiled.outcome == 'success'
+      continue-on-error: true
+      run: |
+        tar -xzf compile-artifact.tar.gz
+        rm compile-artifact.tar.gz
     - name: Run tests
       env: ${{ fromJSON(inputs.envs) }}
       run: |
@@ -872,6 +887,10 @@ jobs:
         # R issues at docker environment
         export TZ=UTC
         export _R_CHECK_SYSTEM_CLOCK_=FALSE
+        if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then
+          export SKIP_SCALA_BUILD=true
+          echo "Reusing precompiled artifact, skipping local SBT build."
+        fi
         ./dev/run-tests --parallelism 1 --modules sparkr
     - name: Upload test results to report
       if: always()


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to