This is an automated email from the ASF dual-hosted git repository.
dongjoon-hyun pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/spark-kubernetes-operator.git
The following commit(s) were added to refs/heads/main by this push:
new 87fdd88 [SPARK-57157] Harden fork CI status workflows against
run/check-run race and pagination
87fdd88 is described below
commit 87fdd88259d8c25e1c50ab961436bf7297b2085b
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Fri May 29 18:42:52 2026 -0700
[SPARK-57157] Harden fork CI status workflows against run/check-run race
and pagination
### What changes were proposed in this pull request?
Port the hardening fixes from Apache Spark (SPARK-57154, SPARK-57155) to
this
repository's `notify_test_workflow.yml` and `update_build_status.yml`,
which were
introduced here as a copy of Spark's fork-based CI status mechanism
(SPARK-57153).
`notify_test_workflow.yml`:
1. When listing the fork's workflow runs, instead of blindly taking the most
recent run (`workflow_runs[0]`) and throwing if its `head_sha` does not
match
the PR head SHA, retry (up to 3 times, 3s apart) looking for the run
whose
`head_sha` matches the PR head SHA. The listing endpoint orders by most
recent,
so the run for the just-pushed SHA may not be registered yet and a stale
run
from a previous push could be returned.
2. When resolving the `Run / License Check` check-run id (used only to
render a
Check-run view link instead of the Actions view, see SPARK-37879), a
missing
check-run no longer throws. The check-run materializes later than the
workflow
run, especially when the matrix is queued, so this is now best-effort:
if it
cannot be found, the `Build` check is still created pointing at the
Actions
run URL.
`update_build_status.yml`:
3. List a commit's check-runs with `github.paginate(..., per_page: 100)`
instead
of a single un-paginated request, matching `notify_test_workflow.yml`.
The
default page size is 30, so the target `Build` check could fall off the
first
page on a SHA that accumulates more check-runs than that.
4. Wrap `JSON.parse(cr.output.text)` in try/catch and `continue` on
failure, so a
`Build` check with empty or malformed output text does not abort the
whole
scheduled run and block updates for every PR queued behind it.
### Why are the changes needed?
The race conditions previously left a PR with no `Build` check at all, and
the
scheduled updater only syncs existing checks, so the PR had no status
reported
until the next push. The pagination and parsing issues silently block status
updates, leaving PRs stuck in `queued`.
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure only.
### How was this patch tested?
Static verification: the embedded `actions/github-script` bodies pass
`node --check`, and the workflow YAML parses.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)
Closes #699 from viirya/SPARK-57157-harden-ci-status.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.github/workflows/notify_test_workflow.yml | 94 +++++++++++++++++++-----------
.github/workflows/update_build_status.yml | 36 +++++++++---
2 files changed, 88 insertions(+), 42 deletions(-)
diff --git a/.github/workflows/notify_test_workflow.yml
b/.github/workflows/notify_test_workflow.yml
index 03debf0..2a99587 100644
--- a/.github/workflows/notify_test_workflow.yml
+++ b/.github/workflows/notify_test_workflow.yml
@@ -61,22 +61,48 @@ jobs:
console.log('Ref: ' + context.payload.pull_request.head.ref)
console.log('SHA: ' + context.payload.pull_request.head.sha)
+ const name = 'Build'
+ const head_sha = context.payload.pull_request.head.sha
+ let status = 'queued'
+
// Wait 3 seconds to make sure the fork repository triggered a
workflow.
await new Promise(r => setTimeout(r, 3000))
- let runs
- try {
- runs = await github.request(endpoint, params)
- } catch (error) {
- console.error(error)
- // Assume that runs were not found.
+ // The workflow run for this exact SHA may not be registered yet,
and the
+ // listing endpoint orders by most recent, so blindly taking the
first run
+ // can return a stale run from a previous push. Re-query a few
times looking
+ // for the run whose head_sha matches this PR's head SHA before
giving up.
+ let matched_run
+ let any_runs = false
+ let retryCount = 0
+ while (retryCount < 3) {
+ let runs
+ try {
+ runs = await github.request(endpoint, params)
+ } catch (error) {
+ console.error(error)
+ // Assume that runs were not found.
+ }
+ if (runs && runs.data.workflow_runs.length > 0) {
+ any_runs = true
+ matched_run = runs.data.workflow_runs.find(r => r.head_sha ==
head_sha)
+ if (matched_run) {
+ break
+ }
+ }
+ retryCount++
+ if (retryCount < 3) {
+ await new Promise(resolve => setTimeout(resolve, 3000))
+ }
}
- const name = 'Build'
- const head_sha = context.payload.pull_request.head.sha
- let status = 'queued'
+ // If we saw runs but none matched the PR head SHA, a newer commit
was pushed
+ // and a fresh notify run will handle it; nothing useful to report
for this SHA.
+ if (any_runs && !matched_run) {
+ throw new Error('There was a new unsynced commit pushed. Please
retrigger the workflow.');
+ }
- if (!runs || runs.data.workflow_runs.length === 0) {
+ if (!matched_run) {
status = 'completed'
const conclusion = 'action_required'
@@ -108,18 +134,26 @@ jobs:
}
})
} else {
- const run_id = runs.data.workflow_runs[0].id
+ const run_id = matched_run.id
- if (runs.data.workflow_runs[0].head_sha !=
context.payload.pull_request.head.sha) {
- throw new Error('There was a new unsynced commit pushed.
Please retrigger the workflow.');
- }
+ const actions_url = 'https://github.com/'
+ + context.payload.pull_request.head.repo.full_name
+ + '/actions/runs/'
+ + run_id
+ console.log('Actions URL: ' + actions_url)
- // Here we get check run ID to provide Check run view instead of
Actions view, see also SPARK-37879.
+ // Here we get the check run ID to provide a Check run view
instead of the
+ // Actions view, see also SPARK-37879. The check run may not
have materialized
+ // yet (it is created later than the workflow run, especially
when the matrix
+ // is queued), so this is best-effort: if it cannot be found, we
fall back to
+ // the Actions run URL rather than failing and leaving the PR
with no Build
+ // check for the scheduled updater to sync.
let retryCount = 0;
let check_run_head;
while (retryCount < 3) {
const check_runs = await github.request(check_run_endpoint,
check_run_params);
- check_run_head = check_runs.data.check_runs.find(r => r.name
=== "Run / License Check");
+ check_run_head = check_runs.data.check_runs.find(
+ r => r.name === "Run / License Check" && r.head_sha ==
head_sha);
if (check_run_head) {
break;
}
@@ -128,26 +162,18 @@ jobs:
await new Promise(resolve => setTimeout(resolve, 3000));
}
}
- if (!check_run_head) {
- throw new Error('Failed to retrieve check_run_head after 3
attempts');
- }
- if (check_run_head.head_sha !=
context.payload.pull_request.head.sha) {
- throw new Error('There was a new unsynced commit pushed.
Please retrigger the workflow.');
+ let summary_url = actions_url
+ if (check_run_head) {
+ summary_url = 'https://github.com/'
+ + context.payload.pull_request.head.repo.full_name
+ + '/runs/'
+ + check_run_head.id
+ console.log('Check run URL: ' + summary_url)
+ } else {
+ console.log('Check run not found; falling back to Actions URL:
' + actions_url)
}
- const check_run_url = 'https://github.com/'
- + context.payload.pull_request.head.repo.full_name
- + '/runs/'
- + check_run_head.id
- console.log('Check run URL: ' + check_run_url)
-
- const actions_url = 'https://github.com/'
- + context.payload.pull_request.head.repo.full_name
- + '/actions/runs/'
- + run_id
- console.log('Actions URL: ' + actions_url)
-
github.rest.checks.create({
owner: context.repo.owner,
repo: context.repo.repo,
@@ -156,7 +182,7 @@ jobs:
status: status,
output: {
title: 'Test results',
- summary: '[See test results](' + check_run_url + ')\n\n'
+ summary: '[See test results](' + summary_url + ')\n\n'
+ 'If the tests fail for reasons unrelated to this pull
request, '
+ 'please rerun the workflow in your forked repository.\n'
+ 'If the failures are related to this pull request, '
diff --git a/.github/workflows/update_build_status.yml
b/.github/workflows/update_build_status.yml
index 26ab78f..6b16c59 100644
--- a/.github/workflows/update_build_status.yml
+++ b/.github/workflows/update_build_status.yml
@@ -53,17 +53,37 @@ jobs:
console.log('SHA: ' + pr.head.sha)
console.log(' Mergeable status: ' + pr.mergeable_state)
if (pr.mergeable_state == null ||
maybeReady.includes(pr.mergeable_state)) {
- const checkRuns = await github.request('GET
/repos/{owner}/{repo}/commits/{ref}/check-runs', {
- owner: context.repo.owner,
- repo: context.repo.repo,
- ref: pr.head.sha
- })
+ // Paginate with per_page=100 to match
notify_test_workflow.yml. The default
+ // page size is 30, and a SHA can accumulate more check-runs
than that (CI
+ // matrix, external checks, duplicate Build checks from
reopened PRs), which
+ // could push the target Build check off the first page and
leave the PR
+ // stuck in 'queued' forever.
+ const checkRuns = await github.paginate(
+ 'GET /repos/{owner}/{repo}/commits/{ref}/check-runs',
+ {
+ owner: context.repo.owner,
+ repo: context.repo.repo,
+ ref: pr.head.sha,
+ per_page: 100
+ }
+ )
// Iterator GitHub Checks in the PR
- for await (const cr of checkRuns.data.check_runs) {
+ for await (const cr of checkRuns) {
if (cr.name == 'Build' && cr.conclusion !=
"action_required") {
- // text contains parameters to make request in JSON.
- const params = JSON.parse(cr.output.text)
+ // text contains parameters to make request in JSON. A
Build check
+ // created by something other than
notify_test_workflow.yml (an older
+ // version, a manual run, or another app) may have empty
or malformed
+ // output text; skip it instead of aborting the whole
scheduled run,
+ // which would block updates for every PR queued behind
it.
+ let params
+ try {
+ params = JSON.parse(cr.output.text)
+ } catch (error) {
+ console.error('Skipping Build check ' + cr.id + ' with
unparseable output text')
+ console.error(error)
+ continue
+ }
// Get the workflow run in the forked repository
let run
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]