[PR] ci: integrate CodSpeed continuous benchmarking [arrow-rs]

via GitHub Thu, 14 May 2026 11:41:38 -0700


adriangb opened a new pull request, #9975:
URL: https://github.com/apache/arrow-rs/pull/9975


   ## Summary
   
   Wires the existing criterion benches in this workspace into 
[CodSpeed](https://codspeed.io) for continuous performance tracking. CodSpeed 
runs benches under CPU simulation in CI and posts per-PR comparison reports vs. 
the base branch's latest main run.
   
   This PR is opt-in once activated: the PR workflow only fires when a 
maintainer adds a `bench:*` label, so external contributors don't blindly burn 
CI capacity. The main-push workflow keeps the baseline current.
   
   The integration has been validated end-to-end on a fork 
([`pydantic/arrow-rs`](https://github.com/pydantic/arrow-rs)): 3031 benchmarks 
captured from a single main run, PR runs produce clean comparison comments 
(e.g. *"Merging this PR will not alter performance — ✅ 7 untouched benchmarks, 
⏩ 3024 skipped benchmarks, comparing codspeed-smoke-test (5b1320a) with main 
(fcbe248)"*). Public dashboard: https://codspeed.io/pydantic/arrow-rs
   
   ## Design
   
   ### Drop-in shim, no bench source changes
   
   The `criterion` workspace dependency is renamed (via the `[package]` cargo 
trick) to `codspeed-criterion-compat`. This is a CodSpeed-maintained 
passthrough — when not running under `cargo codspeed`, it forwards to real 
criterion, so `cargo bench` locally is unchanged and every existing `use 
criterion::*` in every bench source file compiles unmodified.
   
   ```toml
   # Cargo.toml (workspace)
   criterion = { package = "codspeed-criterion-compat", version = "4.6", 
default-features = false }
   ```
   
   ### Sharded one job per `[[bench]]` target
   
   Required for two reasons:
   1. The full workspace produces well over 1000 individual benchmarks 
(criterion parameterizes heavily), which exceeds CodSpeed's [per-upload 
limit](https://codspeed.io/docs/features/sharded-benchmarks).
   2. Even the `parquet` crate alone exceeds 1000 — per-crate sharding wasn't 
fine enough.
   
   Jobs within a single workflow are auto-aggregated by CodSpeed into one 
unified report.
   
   ### Build once, run many
   
   ```
   setup ─┐
          ├──→ bench (matrix, ~78 shards)
   build ─┘
   ```
   
   - `setup` parses every workspace member's `Cargo.toml` for `[[bench]]` 
entries (awk + jq), emits a JSON `{crate, bench}` array; new bench targets are 
picked up automatically.
   - `build` runs the full-workspace `cargo codspeed build` exactly once, packs 
`target/codspeed/` into a tarball (tar preserves the +x bit; 
`actions/upload-artifact` strips it otherwise), uploads as a 1-day artifact.
   - Each bench shard downloads the artifact, unpacks it, runs `cargo codspeed 
run -p <crate> --bench <bench>`. No per-shard rebuild — CI cost scales with N × 
~2 min instead of N × full build.
   
   ### Label-gated PRs
   
   `codspeed-pr.yml` fires on `pull_request: [labeled, synchronize, opened, 
reopened]` and only runs when the PR has at least one `bench:*` label:
   
   | Label                                   | Effect                           
     |
   | --------------------------------------- | 
------------------------------------- |
   | `bench:all`                             | Every `[[bench]]` in the 
workspace    |
   | `bench:<crate>`                         | Every `[[bench]]` in that crate  
     |
   | `bench:<crate-a> bench:<crate-b>`       | Union                            
     |
   
   Label suffixes are validated against `^[a-z][a-z0-9_-]*$`. Authorization is 
implicit: only users with write access can add labels.
   
   While the label is attached, every push to the PR re-runs the suite 
(`synchronize` event); re-runs cancel in-progress shards via `concurrency: 
cancel-in-progress: true`.
   
   ### OIDC auth
   
   Public repo, no `CODSPEED_TOKEN` secret required — the workflow's `id-token: 
write` claim is what CodSpeed verifies. Workflows are repo-agnostic.
   
   ## Exclusions
   
   Ten bench targets currently fail at runtime in this workspace — pre-existing 
issues in the bench targets themselves, not the integration. They're listed in 
an `EXCLUDED_BENCHES` env in both workflows so the remaining ~78 shards run 
clean. Each excluded target should be fixed (or removed) and dropped from the 
list one by one:
   
   | Target | Observed failure mode |
   | --- | --- |
   | `arrow / merge_kernels` | panics at 
`arrow-data/src/transform/primitive.rs:31:43` |
   | `arrow / buffer_bit_ops` | runtime error |
   | `arrow / buffer_create` | runtime error |
   | `arrow / sort_kernel` | runtime error |
   | `arrow / string_run_builder` | runtime error |
   | `arrow / primitive_run_accessor` | runtime error |
   | `arrow-array / union_array` | runtime error |
   | `arrow-cast / parse_date` | runtime error |
   | `parquet / row_selection_cursor` | runtime error |
   | `parquet-variant-compute / variant_kernels` | intermittent |
   
   I'm happy to file separate upstream issues for each if helpful — or to drop 
the exclusion list entirely if maintainers prefer to investigate them all at 
once. The same `merge_kernels` exclusion was added by the official CodSpeed 
wizard's auto-generated PR (https://codspeed.io/docs/get-started/wizard), so 
this is consistent prior art.
   
   ## Prerequisites for activation
   
   This PR adds the workflow files but they're inert until two repo-admin 
actions land:
   
   1. **Install the [CodSpeed GitHub App](https://github.com/apps/codspeed)** 
on `apache/arrow-rs`. This is what posts the PR comparison comment + status 
check.
   2. **Enroll the repository at https://codspeed.io**. OIDC is automatic for 
public repos — no secret token configuration required.
   
   Once both are done, the first push to `main` will populate the baseline and 
PRs labeled `bench:*` will receive automated comparison comments.
   
   ## CI cost notes
   
   - Main-push workflow: 1 build + 78 shards. Build job dominates wallclock 
(~10 min); shards run in parallel and download from one artifact, ~2 min each.
   - PR workflow: same `build`, but only the bench shards for the labeled 
crates. A typical `bench:arrow-cast` run is build + 3 shards.
   - Per-target bench binaries are bundled in one ~1-2 GB artifact (well under 
GitHub's 5 GB free-tier limit).
   
   ## Test plan
   
   - [x] `cargo check --workspace --benches --features 
arrow/test_utils,arrow-schema/ffi,parquet/test_common,parquet/experimental,parquet/async,parquet/object_store`
 passes against this branch
   - [x] End-to-end validation on `pydantic/arrow-rs`: main baseline run 
captured 3031 benchmarks; PR run posts comparison comment correctly; per-shard 
sharding stays under the 1000-benchmark limit
   - [ ] After merge and CodSpeed-App install on `apache/arrow-rs`, first main 
run populates baseline at https://codspeed.io/apache/arrow-rs
   - [ ] Create the `bench:all` and per-crate `bench:<crate>` labels in repo 
settings
   - [ ] Add `bench:<crate>` to a real PR; confirm comparison comment + status 
check appear
   
   ## References
   
   - CodSpeed docs: https://codspeed.io/docs
   - Sharded benchmarks: https://codspeed.io/docs/features/sharded-benchmarks
   - Compat shim source: https://github.com/CodSpeedHQ/codspeed-rust
   - Prior auto-generated wizard PR on pydantic fork: 
https://github.com/pydantic/arrow-rs/pull/11 (single-shard; hit the >1000 
limit, which this PR resolves)
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] ci: integrate CodSpeed continuous benchmarking [arrow-rs]

Reply via email to