Ma77Ball commented on code in PR #5671:
URL: https://github.com/apache/texera/pull/5671#discussion_r3408759814
##########
.github/workflows/benchmarks.yml:
##########
@@ -235,6 +246,61 @@ jobs:
# inside bin/run-benchmarks.sh and adding a publish step below.
run: bash bin/run-benchmarks.sh
+ - name: Benchmark main baseline in the same runner
+ # PR only: re-run the IDENTICAL trimmed grid against the base-branch
+ # (main) commit this PR targets, in THIS runner, right after the PR
+ # run above. Comparing two runs from the same machine cancels the
+ # cross-runner hardware variance that otherwise dominates CI bench
+ # deltas, so benchmarks-pr-comment.yml can show a trustworthy
+ # main-vs-branch comparison instead of PR-here vs a stored baseline
+ # captured on some other runner.
+ #
+ # The output convention is preserved: the PR's own outputs stay in
+ # bench-results/ untouched; we only ADD main's CSV as
+ # arrow-flight-e2e-main.csv (plus the base SHA in a sidecar file).
+ # The PR-mode grid is deterministic (see GridSpec in
+ # ArrowFlightActorBench.scala), so main's rows key 1:1 against the
+ # PR's rows for the comparison.
+ #
+ # Fail-soft by construction: no `set -e`, and a trap restores the
+ # PR's results plus the original checkout no matter where the main
+ # re-run dies (broken main, compile error, etc). On failure we emit
+ # no main CSV, and the comment workflow falls back to the stored
+ # gh-pages baseline. We also skip entirely if the PR run produced no
+ # CSV (e.g. the bench itself failed upstream).
+ if: ${{ github.event_name == 'pull_request' && !cancelled() }}
+ env:
+ BASE_SHA: ${{ github.event.pull_request.base.sha }}
+ run: |
+ set -uo pipefail
+ if [ ! -f bench-results/arrow-flight-e2e.csv ]; then
+ echo "::warning::no PR bench CSV; skipping same-runner main
baseline."
+ exit 0
+ fi
+ ORIG_REF=$(git rev-parse HEAD)
+ # Park the PR's outputs; main's re-run writes a fresh bench-results/.
+ mv bench-results bench-results-pr
+ restore() {
+ rm -rf bench-results
+ mv bench-results-pr bench-results 2>/dev/null || true
+ git checkout --force "$ORIG_REF" 2>/dev/null || true
+ }
+ trap restore EXIT
+ if ! git checkout --force "$BASE_SHA"; then
+ echo "::warning::could not check out base SHA $BASE_SHA; skipping
main baseline."
+ exit 0
+ fi
+ # Regenerate proto bindings against main's protos, then re-bench.
+ bash bin/python-proto-gen.sh || { echo "::warning::main proto-gen
failed; skipping main baseline."; exit 0; }
+ if bash bin/run-benchmarks.sh && [ -f
bench-results/arrow-flight-e2e.csv ]; then
+ cp bench-results/arrow-flight-e2e.csv
bench-results-pr/arrow-flight-e2e-main.csv
+ printf '%s' "$BASE_SHA" >
bench-results-pr/arrow-flight-e2e-main.commit.txt
+ echo "captured same-runner main baseline at $BASE_SHA"
+ else
+ echo "::warning::main baseline re-run failed; PR comment falls
back to the gh-pages baseline."
Review Comment:
Added a `uv pip install` of main's
`requirements.txt`/`operator-requirements.txt`/`dev-requirements.txt` after the
base checkout, so the baseline now uses main's Python deps. (Scala already
recompiles via sbt.) Fail-soft: skips the baseline if install fails.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]