This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch branch-4.2
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.2 by this push:
new 8dffe020266e [SPARK-56831][INFRA][R] Share SBT precompile artifact
with sparkr CI job
8dffe020266e is described below
commit 8dffe020266e073ecd5d3206b5d299005ba7da25
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Tue May 12 13:46:11 2026 +0800
[SPARK-56831][INFRA][R] Share SBT precompile artifact with sparkr CI job
### What changes were proposed in this pull request?
Follow-up to
[SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768)
(apache/spark#55726), which introduced a shared `precompile` CI job that runs
Spark's SBT build once and publishes the resulting `target/` trees as a GitHub
Actions artifact for the pyspark matrix entries to consume. This PR extends
that same artifact to the `sparkr` build.
Concretely:
- The `precompile` job's `if:` gate now also fires when `sparkr == 'true'`
is set in the precondition output, so the artifact is built whenever only
sparkr changes.
- The `sparkr` job adds `precompile` to `needs:`, downloads and extracts
the artifact (with the same graceful fallback as the pyspark matrix), and
exports `SKIP_SCALA_BUILD=true` for `dev/run-tests.py` only when the artifact
was successfully extracted.
- No `dev/run-tests.py` change is needed — the `SKIP_SCALA_BUILD` gate
landed with SPARK-56768.
### Optional: graceful fallback if precompile fails
Same pattern as the pyspark matrix:
- The "Download precompiled artifact" step is gated on
`needs.precompile.result == 'success'` and has `continue-on-error: true`.
- The "Extract precompiled artifact" step is gated on the download
succeeding and also has `continue-on-error: true`.
- Inside the "Run tests" bash block, `SKIP_SCALA_BUILD=true` is exported
only when `steps.extract-precompiled.outcome == 'success'`. Otherwise it stays
unset and `dev/run-tests.py` falls back to the original local SBT build.
So a precompile/download/extract failure degrades sparkr to the pre-PR
behavior, not a workflow failure.
### Why are the changes needed?
The sparkr job today runs the same ~13m of redundant SBT compile that the
pyspark matrix used to run. Reusing the existing precompile artifact removes
that redundant work. The `precompile` job is already running in any workflow
run where pyspark changes are present; adding sparkr as another consumer is
essentially free (just another download of the same artifact).
When sparkr is the only changed module, the `precompile` job is now
scheduled to run anyway (via the new `sparkr == 'true'` clause in its `if:`
gate), so this case picks up the same saving.
### Estimated savings
| | Per sparkr run |
|---|---:|
| Redundant SBT compile in sparkr today | ~13m |
| Add back: download + extract overhead | ~1m |
| **Net CI compute saved per sparkr run** | **~12m** |
This is on top of the ~96m / ~14% already saved by SPARK-56768. The actual
wall clock for the sparkr job will drop by roughly the same amount (sparkr is
not on the critical path; the pyspark matrix still drives the workflow's
wall-clock).
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure change only.
### How was this patch tested?
The change is exercised by the CI run of this PR itself, when the sparkr
job runs. The expected log signature inside "Run tests" is `Reusing precompiled
artifact, skipping local SBT build.`, mirroring what the pyspark matrix already
prints. If the precompile artifact is not available (precompile job failed, or
this is some future caller that doesn't enable it), sparkr falls back to the
local SBT build path, which is identical to today's behavior.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
Closes #55761 from zhengruifeng/share-precompile-sparkr.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit ef4e78489f4a2fc2635e96170934ee5791534588)
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.github/workflows/build_and_test.yml | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 99223efbf0dd..3f3400153cde 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -540,7 +540,8 @@ jobs:
(!cancelled()) && (
fromJson(needs.precondition.outputs.required).pyspark == 'true' ||
fromJson(needs.precondition.outputs.required).pyspark-pandas == 'true'
||
- fromJson(needs.precondition.outputs.required).pyspark-install ==
'true')
+ fromJson(needs.precondition.outputs.required).pyspark-install ==
'true' ||
+ fromJson(needs.precondition.outputs.required).sparkr == 'true')
name: "Precompile Spark"
runs-on: ubuntu-latest
timeout-minutes: 60
@@ -806,7 +807,7 @@ jobs:
path: "**/target/unit-tests.log"
sparkr:
- needs: [precondition, infra-image]
+ needs: [precondition, infra-image, precompile]
# always run if sparkr == 'true', even infra-image is skip (such as
non-master job)
if: (!cancelled()) && fromJson(needs.precondition.outputs.required).sparkr
== 'true'
name: "Build modules: sparkr"
@@ -865,6 +866,20 @@ jobs:
with:
distribution: zulu
java-version: ${{ inputs.java }}
+ - name: Download precompiled artifact
+ id: download-precompiled
+ if: needs.precompile.result == 'success'
+ continue-on-error: true
+ uses: actions/download-artifact@v6
+ with:
+ name: spark-compile-${{ inputs.branch }}-${{ github.run_id }}
+ - name: Extract precompiled artifact
+ id: extract-precompiled
+ if: steps.download-precompiled.outcome == 'success'
+ continue-on-error: true
+ run: |
+ tar -xzf compile-artifact.tar.gz
+ rm compile-artifact.tar.gz
- name: Run tests
env: ${{ fromJSON(inputs.envs) }}
run: |
@@ -872,6 +887,10 @@ jobs:
# R issues at docker environment
export TZ=UTC
export _R_CHECK_SYSTEM_CLOCK_=FALSE
+ if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then
+ export SKIP_SCALA_BUILD=true
+ echo "Reusing precompiled artifact, skipping local SBT build."
+ fi
./dev/run-tests --parallelism 1 --modules sparkr
- name: Upload test results to report
if: always()
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]