adriangb opened a new pull request, #9975: URL: https://github.com/apache/arrow-rs/pull/9975
## Summary Wires the existing criterion benches in this workspace into [CodSpeed](https://codspeed.io) for continuous performance tracking. CodSpeed runs benches under CPU simulation in CI and posts per-PR comparison reports vs. the base branch's latest main run. This PR is opt-in once activated: the PR workflow only fires when a maintainer adds a `bench:*` label, so external contributors don't blindly burn CI capacity. The main-push workflow keeps the baseline current. The integration has been validated end-to-end on a fork ([`pydantic/arrow-rs`](https://github.com/pydantic/arrow-rs)): 3031 benchmarks captured from a single main run, PR runs produce clean comparison comments (e.g. *"Merging this PR will not alter performance — ✅ 7 untouched benchmarks, ⏩ 3024 skipped benchmarks, comparing codspeed-smoke-test (5b1320a) with main (fcbe248)"*). Public dashboard: https://codspeed.io/pydantic/arrow-rs ## Design ### Drop-in shim, no bench source changes The `criterion` workspace dependency is renamed (via the `[package]` cargo trick) to `codspeed-criterion-compat`. This is a CodSpeed-maintained passthrough — when not running under `cargo codspeed`, it forwards to real criterion, so `cargo bench` locally is unchanged and every existing `use criterion::*` in every bench source file compiles unmodified. ```toml # Cargo.toml (workspace) criterion = { package = "codspeed-criterion-compat", version = "4.6", default-features = false } ``` ### Sharded one job per `[[bench]]` target Required for two reasons: 1. The full workspace produces well over 1000 individual benchmarks (criterion parameterizes heavily), which exceeds CodSpeed's [per-upload limit](https://codspeed.io/docs/features/sharded-benchmarks). 2. Even the `parquet` crate alone exceeds 1000 — per-crate sharding wasn't fine enough. Jobs within a single workflow are auto-aggregated by CodSpeed into one unified report. ### Build once, run many ``` setup ─┐ ├──→ bench (matrix, ~78 shards) build ─┘ ``` - `setup` parses every workspace member's `Cargo.toml` for `[[bench]]` entries (awk + jq), emits a JSON `{crate, bench}` array; new bench targets are picked up automatically. - `build` runs the full-workspace `cargo codspeed build` exactly once, packs `target/codspeed/` into a tarball (tar preserves the +x bit; `actions/upload-artifact` strips it otherwise), uploads as a 1-day artifact. - Each bench shard downloads the artifact, unpacks it, runs `cargo codspeed run -p <crate> --bench <bench>`. No per-shard rebuild — CI cost scales with N × ~2 min instead of N × full build. ### Label-gated PRs `codspeed-pr.yml` fires on `pull_request: [labeled, synchronize, opened, reopened]` and only runs when the PR has at least one `bench:*` label: | Label | Effect | | --------------------------------------- | ------------------------------------- | | `bench:all` | Every `[[bench]]` in the workspace | | `bench:<crate>` | Every `[[bench]]` in that crate | | `bench:<crate-a> bench:<crate-b>` | Union | Label suffixes are validated against `^[a-z][a-z0-9_-]*$`. Authorization is implicit: only users with write access can add labels. While the label is attached, every push to the PR re-runs the suite (`synchronize` event); re-runs cancel in-progress shards via `concurrency: cancel-in-progress: true`. ### OIDC auth Public repo, no `CODSPEED_TOKEN` secret required — the workflow's `id-token: write` claim is what CodSpeed verifies. Workflows are repo-agnostic. ## Exclusions Ten bench targets currently fail at runtime in this workspace — pre-existing issues in the bench targets themselves, not the integration. They're listed in an `EXCLUDED_BENCHES` env in both workflows so the remaining ~78 shards run clean. Each excluded target should be fixed (or removed) and dropped from the list one by one: | Target | Observed failure mode | | --- | --- | | `arrow / merge_kernels` | panics at `arrow-data/src/transform/primitive.rs:31:43` | | `arrow / buffer_bit_ops` | runtime error | | `arrow / buffer_create` | runtime error | | `arrow / sort_kernel` | runtime error | | `arrow / string_run_builder` | runtime error | | `arrow / primitive_run_accessor` | runtime error | | `arrow-array / union_array` | runtime error | | `arrow-cast / parse_date` | runtime error | | `parquet / row_selection_cursor` | runtime error | | `parquet-variant-compute / variant_kernels` | intermittent | I'm happy to file separate upstream issues for each if helpful — or to drop the exclusion list entirely if maintainers prefer to investigate them all at once. The same `merge_kernels` exclusion was added by the official CodSpeed wizard's auto-generated PR (https://codspeed.io/docs/get-started/wizard), so this is consistent prior art. ## Prerequisites for activation This PR adds the workflow files but they're inert until two repo-admin actions land: 1. **Install the [CodSpeed GitHub App](https://github.com/apps/codspeed)** on `apache/arrow-rs`. This is what posts the PR comparison comment + status check. 2. **Enroll the repository at https://codspeed.io**. OIDC is automatic for public repos — no secret token configuration required. Once both are done, the first push to `main` will populate the baseline and PRs labeled `bench:*` will receive automated comparison comments. ## CI cost notes - Main-push workflow: 1 build + 78 shards. Build job dominates wallclock (~10 min); shards run in parallel and download from one artifact, ~2 min each. - PR workflow: same `build`, but only the bench shards for the labeled crates. A typical `bench:arrow-cast` run is build + 3 shards. - Per-target bench binaries are bundled in one ~1-2 GB artifact (well under GitHub's 5 GB free-tier limit). ## Test plan - [x] `cargo check --workspace --benches --features arrow/test_utils,arrow-schema/ffi,parquet/test_common,parquet/experimental,parquet/async,parquet/object_store` passes against this branch - [x] End-to-end validation on `pydantic/arrow-rs`: main baseline run captured 3031 benchmarks; PR run posts comparison comment correctly; per-shard sharding stays under the 1000-benchmark limit - [ ] After merge and CodSpeed-App install on `apache/arrow-rs`, first main run populates baseline at https://codspeed.io/apache/arrow-rs - [ ] Create the `bench:all` and per-crate `bench:<crate>` labels in repo settings - [ ] Add `bench:<crate>` to a real PR; confirm comparison comment + status check appear ## References - CodSpeed docs: https://codspeed.io/docs - Sharded benchmarks: https://codspeed.io/docs/features/sharded-benchmarks - Compat shim source: https://github.com/CodSpeedHQ/codspeed-rust - Prior auto-generated wizard PR on pydantic fork: https://github.com/pydantic/arrow-rs/pull/11 (single-shard; hit the >1000 limit, which this PR resolves) 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
