This is an automated email from the ASF dual-hosted git repository.

zhengruifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 869adad659f8 [SPARK-56866][INFRA] Pin downstream actions/checkout to a 
single resolved SHA
869adad659f8 is described below

commit 869adad659f8ce5c449daba4123f779f76b41ba6
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Tue May 19 08:58:23 2026 +0800

    [SPARK-56866][INFRA] Pin downstream actions/checkout to a single resolved 
SHA
    
    ### What changes were proposed in this pull request?
    
    In `.github/workflows/build_and_test.yml`, add a step to the `precondition` 
job that captures `git rev-parse HEAD` right after the apache/spark checkout, 
exposes it as a `head_sha` output, and switch every downstream 
`actions/checkout` from `ref: ${{ inputs.branch }}` to `ref: ${{ 
needs.precondition.outputs.head_sha }}`. The `precondition` job's own checkout 
still resolves `inputs.branch`; the 11 downstream checkouts (`build`, 
`infra-image`, `precompile`, `pyspark`, `sparkr`, `buf`, ` [...]
    
    ### Why are the changes needed?
    
    Today each `actions/checkout` step independently re-resolves `ref: ${{ 
inputs.branch }}` (default `master`) at the moment the runner picks it up. 
Different jobs in the same workflow run can therefore end up testing different 
commits.
    
    **This is a long-standing issue.** `ref: ${{ inputs.branch }}` has been in 
`build_and_test.yml` since commit `9e468cf010f` (SPARK-39521, 2022-06-21) — 
~3.5 years. The race has existed the entire time. It usually goes unnoticed 
because a normal master commit doesn't cross the JVM/Python boundary, so even 
when jobs do see different commits the tests stay consistent within each job.
    
    **It becomes a real problem during merge bursts.** Commits per hour on 
master vary wildly; release-prep windows, end-of-week merges, and APAC + EU 
overlap regularly push 3–6 commits in 20 minutes. The drift window for 
`pyspark` jobs is structurally ~17 minutes (`precompile` time) plus runner 
queue wait — so during a merge burst the probability that at least one commit 
lands inside that window approaches 1. When the unlucky commit happens to add a 
tightly-coupled change — new Spark Con [...]
    
    ```
    [CONNECT_INVALID_PLAN.INVALID_ONE_OF_FIELD_NOT_SET]
    The Spark Connect plan is invalid. This oneOf field in 
spark.connect.Relation is not set: RELTYPE_NOT_SET
    ```
    
    Concrete example from 2026-05-14:
    - Run 
[25835824862](https://github.com/apache/spark/actions/runs/25835824862) 
triggered by `e19bc35c` (SPARK-56844) — `pyspark-connect` failed with 19 
NEAREST BY errors.
    - Run 
[25835929554](https://github.com/apache/spark/actions/runs/25835929554) 
triggered ~3 minutes later by the next commit `13380e78` (SPARK-56395, which 
added the NEAREST BY feature) — same job passed.
    
    The first run's `precompile` checked out `e19bc35c` (no NEAREST BY server 
code), but by the time its `pyspark-connect` job actually started 17 minutes 
later, master was at `13380e78` and `actions/checkout` resolved that newer 
commit (with the new Python test files). Pinning every job to the SHA 
`precondition` saw makes this impossible.
    
    The fix is also forward-leaning: as Spark's release cadence and contributor 
count grow, the merge-burst probability only goes up; without pinning, 
"spurious red CI on the previous PR every time someone merges a Connect 
feature" will keep recurring.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. CI infrastructure only.
    
    ### How was this patch tested?
    
    YAML syntax validated locally. CI will exercise the change end-to-end.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (claude-opus-4-7)
    
    Closes #55879 from zhengruifeng/ci-pin-checkout-sha.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .github/workflows/build_and_test.yml | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index d8e5df4f9a88..66f3915e2a8e 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -68,6 +68,8 @@ jobs:
       GITHUB_PREV_SHA: ${{ github.event.before }}
     outputs:
       required: ${{ steps.set-outputs.outputs.required }}
+      # Pinned so every downstream job checks out the same snapshot, even if 
`master` advances mid-run.
+      head_sha: ${{ steps.resolve-sha.outputs.head_sha }}
       image_url: ${{ steps.infra-image-outputs.outputs.image_url }}
       image_docs_url: ${{ 
steps.infra-image-docs-outputs.outputs.image_docs_url }}
       image_docs_url_link: ${{ 
steps.infra-image-link.outputs.image_docs_url_link }}
@@ -84,6 +86,9 @@ jobs:
         fetch-depth: 0
         repository: apache/spark
         ref: ${{ inputs.branch }}
+    - name: Resolve apache/spark HEAD SHA
+      id: resolve-sha
+      run: echo "head_sha=$(git rev-parse HEAD)" >> $GITHUB_OUTPUT
     - name: Sync the current branch with the latest in Apache Spark
       if: github.repository != 'apache/spark'
       run: |
@@ -349,7 +354,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Sync the current branch with the latest in Apache Spark
       if: github.repository != 'apache/spark'
       run: |
@@ -467,7 +472,7 @@ jobs:
         with:
           fetch-depth: 0
           repository: apache/spark
-          ref: ${{ inputs.branch }}
+          ref: ${{ needs.precondition.outputs.head_sha }}
       - name: Sync the current branch with the latest in Apache Spark
         if: github.repository != 'apache/spark'
         run: |
@@ -561,7 +566,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Sync the current branch with the latest in Apache Spark
       if: github.repository != 'apache/spark'
       run: |
@@ -683,7 +688,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Add GITHUB_WORKSPACE to git trust safe.directory
       run: |
         git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -833,7 +838,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Add GITHUB_WORKSPACE to git trust safe.directory
       run: |
         git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -922,7 +927,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Sync the current branch with the latest in Apache Spark
       if: github.repository != 'apache/spark'
       run: |
@@ -984,7 +989,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Add GITHUB_WORKSPACE to git trust safe.directory
       run: |
         git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -1183,7 +1188,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Add GITHUB_WORKSPACE to git trust safe.directory
       run: |
         git config --global --add safe.directory ${GITHUB_WORKSPACE}
@@ -1383,7 +1388,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Sync the current branch with the latest in Apache Spark
       if: github.repository != 'apache/spark'
       run: |
@@ -1500,7 +1505,7 @@ jobs:
       with:
         fetch-depth: 0
         repository: apache/spark
-        ref: ${{ inputs.branch }}
+        ref: ${{ needs.precondition.outputs.head_sha }}
     - name: Sync the current branch with the latest in Apache Spark
       if: github.repository != 'apache/spark'
       run: |
@@ -1568,7 +1573,7 @@ jobs:
         with:
           fetch-depth: 0
           repository: apache/spark
-          ref: ${{ inputs.branch }}
+          ref: ${{ needs.precondition.outputs.head_sha }}
       - name: Sync the current branch with the latest in Apache Spark
         if: github.repository != 'apache/spark'
         run: |


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to