This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 869adad659f8 [SPARK-56866][INFRA] Pin downstream actions/checkout to a
single resolved SHA
869adad659f8 is described below
commit 869adad659f8ce5c449daba4123f779f76b41ba6
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Tue May 19 08:58:23 2026 +0800
[SPARK-56866][INFRA] Pin downstream actions/checkout to a single resolved
SHA
### What changes were proposed in this pull request?
In `.github/workflows/build_and_test.yml`, add a step to the `precondition`
job that captures `git rev-parse HEAD` right after the apache/spark checkout,
exposes it as a `head_sha` output, and switch every downstream
`actions/checkout` from `ref: ${{ inputs.branch }}` to `ref: ${{
needs.precondition.outputs.head_sha }}`. The `precondition` job's own checkout
still resolves `inputs.branch`; the 11 downstream checkouts (`build`,
`infra-image`, `precompile`, `pyspark`, `sparkr`, `buf`, ` [...]
### Why are the changes needed?
Today each `actions/checkout` step independently re-resolves `ref: ${{
inputs.branch }}` (default `master`) at the moment the runner picks it up.
Different jobs in the same workflow run can therefore end up testing different
commits.
**This is a long-standing issue.** `ref: ${{ inputs.branch }}` has been in
`build_and_test.yml` since commit `9e468cf010f` (SPARK-39521, 2022-06-21) —
~3.5 years. The race has existed the entire time. It usually goes unnoticed
because a normal master commit doesn't cross the JVM/Python boundary, so even
when jobs do see different commits the tests stay consistent within each job.
**It becomes a real problem during merge bursts.** Commits per hour on
master vary wildly; release-prep windows, end-of-week merges, and APAC + EU
overlap regularly push 3–6 commits in 20 minutes. The drift window for
`pyspark` jobs is structurally ~17 minutes (`precompile` time) plus runner
queue wait — so during a merge burst the probability that at least one commit
lands inside that window approaches 1. When the unlucky commit happens to add a
tightly-coupled change — new Spark Con [...]
```
[CONNECT_INVALID_PLAN.INVALID_ONE_OF_FIELD_NOT_SET]
The Spark Connect plan is invalid. This oneOf field in
spark.connect.Relation is not set: RELTYPE_NOT_SET
```
Concrete example from 2026-05-14:
- Run
[25835824862](https://github.com/apache/spark/actions/runs/25835824862)
triggered by `e19bc35c` (SPARK-56844) — `pyspark-connect` failed with 19
NEAREST BY errors.
- Run
[25835929554](https://github.com/apache/spark/actions/runs/25835929554)
triggered ~3 minutes later by the next commit `13380e78` (SPARK-56395, which
added the NEAREST BY feature) — same job passed.
The first run's `precompile` checked out `e19bc35c` (no NEAREST BY server
code), but by the time its `pyspark-connect` job actually started 17 minutes
later, master was at `13380e78` and `actions/checkout` resolved that newer
commit (with the new Python test files). Pinning every job to the SHA
`precondition` saw makes this impossible.
The fix is also forward-leaning: as Spark's release cadence and contributor
count grow, the merge-burst probability only goes up; without pinning,
"spurious red CI on the previous PR every time someone merges a Connect
feature" will keep recurring.
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure only.
### How was this patch tested?
YAML syntax validated locally. CI will exercise the change end-to-end.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
Closes #55879 from zhengruifeng/ci-pin-checkout-sha.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
.github/workflows/build_and_test.yml | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index d8e5df4f9a88..66f3915e2a8e 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -68,6 +68,8 @@ jobs:
GITHUB_PREV_SHA: ${{ github.event.before }}
outputs:
required: ${{ steps.set-outputs.outputs.required }}
+ # Pinned so every downstream job checks out the same snapshot, even if
`master` advances mid-run.
+ head_sha: ${{ steps.resolve-sha.outputs.head_sha }}
image_url: ${{ steps.infra-image-outputs.outputs.image_url }}
image_docs_url: ${{
steps.infra-image-docs-outputs.outputs.image_docs_url }}
image_docs_url_link: ${{
steps.infra-image-link.outputs.image_docs_url_link }}
@@ -84,6 +86,9 @@ jobs:
fetch-depth: 0
repository: apache/spark
ref: ${{ inputs.branch }}
+ - name: Resolve apache/spark HEAD SHA
+ id: resolve-sha
+ run: echo "head_sha=$(git rev-parse HEAD)" >> $GITHUB_OUTPUT
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -349,7 +354,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -467,7 +472,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -561,7 +566,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -683,7 +688,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -833,7 +838,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -922,7 +927,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -984,7 +989,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -1183,7 +1188,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Add GITHUB_WORKSPACE to git trust safe.directory
run: |
git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -1383,7 +1388,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -1500,7 +1505,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
@@ -1568,7 +1573,7 @@ jobs:
with:
fetch-depth: 0
repository: apache/spark
- ref: ${{ inputs.branch }}
+ ref: ${{ needs.precondition.outputs.head_sha }}
- name: Sync the current branch with the latest in Apache Spark
if: github.repository != 'apache/spark'
run: |
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]