The GitHub Actions job "Benchmarks PR Comment" on texera.git/main has succeeded. Run started by GitHub user ELin2025 (triggered by ELin2025).
Head commit for run: 8001e4c86e8d60971887ad7509b88c42a9fd1ad5 / Yicong Huang <[email protected]> feat(bench): add Arrow Flight E2E benchmark + Benchmarks CI workflow (#5557) ### What changes were proposed in this PR? A bench-agnostic CI lifecycle that future suites (e.g. JMH for `ArrowUtils` micros) plug into by appending one line to `bin/run-benchmarks.sh`, plus the first concrete suite: an end-to-end Arrow Flight + `PythonWorkflowWorker` micro-bench. **Lifecycle** | Trigger | Mode | PR comment | Publish to gh-pages | |---|---|---|---| | `pull_request` (label-gated, mirrors `amber-integration`'s set) | `pr` — 3 configs × 20 batches (~5 min) | ✓ | — | | `push` to `main` | `pr` (post-merge fast signal) | — | ✓ | | `schedule` Sundays 08:00 UTC | `full` — 36 configs × 200 batches (~50-60 min) | — | ✓ | | `workflow_dispatch` | `full` | — | — | PR runs upload the bench as an artifact + render a markdown summary table on the workflow page; the `workflow_run`-triggered `Benchmarks PR Comment` listener (separate file because `pull_request` from forks gets a read-only token and zero secret access) downloads the artifact, sanitizes the CSV, and upserts a single marker-tagged PR comment. Non-blocking — not part of `required-checks.yml`'s aggregator. **First benchmark: Arrow Flight E2E (`ArrowFlightActorBench`)** Spawns a real `PythonWorkflowWorker` actor (real Pekko mailbox + real `texera_run_python_worker.py` subprocess + real Arrow Flight gRPC) wired to an identity Python UDF, then times per-batch send→echo round-trip across a sweep of `batch_size × schema_width × string_len`. Per-config output: throughput (tuples/s, MB/s), latency p50/p95/p99, total ms. Each config writes incrementally so a killed sweep still leaves usable artifacts. ASF: `benchmark-action/github-action-benchmark` is SHA-pinned to `52576c92bccf6ac60c8223ec7eb2565637cae9ba` (v1.22.1) per the apache-infrastructure-actions allow-list. ### Any related issues, documentation, discussions? Closes #5556 ### How was this PR tested? End-to-end validated on a fork-internal PR — [Yicong-Huang/texera#17](https://github.com/Yicong-Huang/texera/pull/17) ran the full `Benchmarks` workflow, the `workflow_run` listener fired, and a marker-tagged comment landed and upserted across two push cycles ([rendered example](https://github.com/Yicong-Huang/texera/pull/17#issuecomment-4645589605)). `workflow_run` only listens on the default branch, so the loop can't be tested from a non-default branch — that's why the dry-run lived on a fork; after merge, the same flow takes effect on `apache/texera:main` automatically. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Report URL: https://github.com/apache/texera/actions/runs/27382772758 With regards, GitHub Actions via GitBox
