andygrove commented on issue #4406: URL: https://github.com/apache/datafusion-comet/issues/4406#issuecomment-4519767548
## CI workload analysis for one `main` commit To inform this issue, here is a breakdown of every workflow that ran on a single representative `push` to `main` (commit [`354ad46`](https://github.com/apache/datafusion-comet/commit/354ad4630679b77cbd6646ba697a49bbd9459b79), 2026-05-22). It shows where the Action minutes actually go. The full report is in the collapsed section at the bottom. ### Workflow durations | Workflow | Duration | Jobs | |---|--:|--:| | Spark SQL Tests | **1h 45m** | 30 | | Run Miri Safety Checks | **1h 28m** | 1 | | Iceberg Spark SQL Tests | **1h 23m** | 19 | | PR Build (macOS) | **52m** | 23 | | PR Build (Linux) | **52m** | 48 | | CodeQL | 2m 09s | 1 | | RAT License Check | 40s | 1 | | Check all test suites added to PR workflows | 17s | 1 | | Validate Github Workflows | 10s | 1 | The commit finishes when the slowest workflow does (~1h 45m), gated by **Spark SQL Tests**. Every test workflow also pays a fixed **20-28m "Build Native Library"** cost before any test job can start. ### Where the time goes, and where to cut it 1. **Spark SQL Tests gates the whole commit at 1h 45m.** The long pole is the `sql_core-1` shard on Spark 4.0/4.1 (60-77m each). Re-balancing the `sql_core` shards, especially splitting the large `SQLQueryTestSuite` golden files off `sql_core-1`, would cut the critical path most directly. 2. **A handful of tests dominate Comet's own suites.** Across both PR Build workflows, roughly 10 tests carry the load: `CometJoinSuite > Broadcast hash join build-side batch coalescing` (156-210s, once per Spark version), `CometAggregateSuite > all types, with nulls` (101-154s), and `CometFuzzIcebergSuite > Iceberg temporal types written as INT96/TIMESTAMP_MICROS/TIMESTAMP_MILLIS` (88-174s). 3. **The native build is a 20-28m fixed tax** paid before every test workflow. Better Rust build caching (sccache or a shared artifact) would shorten the start of every test workflow. 4. **Miri is bottlenecked by one crate.** The `spark-expr` test binary alone is 76 of the job's 88 minutes. Splitting Miri across crates into parallel jobs would roughly halve it. 5. **The Iceberg matrix carries dead weight.** 6 `iceberg-spark-runtime` jobs run no test suite at all (~4m each of pure overhead), and the 1.8.1/1.9.1 jobs run the same suites silently for 35-46m. Trimming to one Spark version per variant plus dropping the test-less runtime jobs would cut roughly half that workflow's wall time. 6. **The Rust unit tests are not the cost.** All 546 nextest tests execute in ~6.3s total; the 20m `rust-test` job is almost entirely compilation. <details> <summary><b>Full report (per-workflow job timings and slowest tests)</b></summary> ### Methodology - Per-test durations are exact for **Comet ScalaTest jobs** (PR Build Linux/macOS), **Spark SQL Tests** (ScalaTest `(duration)` lines), and **Rust nextest** (`PASS [Xs]`). - **Iceberg** logs carry no per-test durations; figures are timestamp-delta proxies distorted by parallel Gradle workers. Trust the class rankings, not the absolute seconds. - **Miri** per-test timing is unreliable (interpreted cooperative threading bursts output at the end), so attribution stops at the binary level. - Job "wall time" is first-to-last log timestamp and includes checkout/setup, not just test execution. The shared up-front native build cost: | Workflow | Native build job | Duration | |---|---|--:| | PR Build (Linux) | Build Native Library | 28m 07s | | Iceberg Spark SQL Tests | Build Native Library | 28m 11s | | Spark SQL Tests | Build Native Library | 27m 34s | | PR Build (macOS) | Build Native Library (macOS) | 20m 20s | ### Spark SQL Tests (1h 45m) Runs Apache Spark's own SQL test suites (`sql/core`, `sql/catalyst`, `sql/hive`) with the Comet plugin enabled, sharded across 28 test jobs over {Spark 3.4.3, 3.5.8, 4.0.2, 4.1.1}. 101,908 per-test lines parsed. Slowest jobs: | Job | Wall time | |---|--:| | sql_core-1 / spark-4.0.2-jdk21 | 77m | | sql_hive-1 / spark-4.1.1-jdk17 | 68m | | sql_core-1 / spark-4.1.1-jdk17 | 63m | | sql_hive-1 / spark-4.0.2-jdk21 | 61m | | sql_core-1 / spark-3.5.8-jdk11 | 57m | | sql_core-2 / spark-4.1.1-jdk17 | 56m | | sql_core-3 / spark-4.0.2-jdk21 | 53m | | catalyst jobs | 16-18m (cheapest module) | Top 15 slowest individual tests: | Test | Dur | Job | |---|--:|---| | subquery/in-subquery/in-joins.sql | 121s | sql_core-2 / 3.5.8 | | SPARK-40492: maintenance before unload | 120s | sql_core-2 / 3.4.3 | | SPARK-40492: maintenance before unload | 120s | sql_core-2 / 3.5.8 | | subquery/in-subquery/in-joins.sql | 118s | sql_core-2 / 4.1.1 | | SPARK-39381: vectorized orc columnar writer batch size configurable | 113s | sql_core-1 / 3.5.8 | | SPARK-39381: vectorized orc columnar writer batch size configurable | 112s | sql_core-1 / 3.5.8 | | postgreSQL/join.sql | 110s | sql_core-2 / 4.1.1 | | SPARK-39381: vectorized orc columnar writer batch size configurable | 103s | sql_core-1 / 4.0.2 | | SPARK-48037: SortShuffleWriter shuffle write metrics | 99s | sql_core-3 / 3.5.8 | | SPARK-39381: vectorized orc columnar writer batch size configurable | 97s | sql_core-1 / 4.0.2 | | postgreSQL/join.sql | 97s | sql_core-2 / 3.5.8 | | postgreSQL/join.sql | 96s | sql_core-2 / 4.0.2 | | postgreSQL/join.sql | 92s | sql_core-2 / 3.4.3 | | subquery/in-subquery/in-joins.sql | 90s | sql_core-2 / 3.4.3 | | subquery/in-subquery/in-joins.sql | 77s | sql_core-2 / 4.0.2 | The slowest tests are `SQLQueryTestSuite` golden-file replays (`in-joins.sql`, `postgreSQL/join.sql`), the ORC vectorized-writer test (`SPARK-39381`, repeatedly slow), and streaming state-store tests. `sql_core` is the heaviest module (608 job-minutes, avg 51m/job) versus `sql_hive` (443m, avg 37m) and `catalyst` (69m, avg 17m). Spark 4.0/4.1 jobs run roughly 25-30% slower per job than 3.4, though JDK differs per version (3.x on JDK11, 4.0 on JDK21, 4.1 on JDK17) so version and JDK effects are confounded. ### Run Miri Safety Checks (1h 28m) A single job running `cargo miri test`. | Test binary | Duration | Tests | |---|--:|--:| | datafusion-comet-spark-expr | 4,578s (76m) | 386 (372 passed, 14 ignored) | | datafusion-comet (core) | 355s (6m) | 113 (76 passed, 37 ignored) | | other crates | < 1m each | | The `spark-expr` crate test binary alone accounts for 76 of the job's 88 minutes. Per-test timing is not meaningful under Miri (the interpreted, cooperatively-scheduled test harness emits all `test ... ok` lines in a burst near the end), so attribution stops at the binary level. ### Iceberg Spark SQL Tests (1h 23m) Builds Comet with Maven, then runs Apache Iceberg's Gradle test suites across {iceberg 1.8.1, 1.9.1, 1.10.0} over {Spark 3.4.3, 3.5.8} over 3 variants (18 test jobs). Slowest jobs: | Job | Wall | Gradle build+test | |---|--:|--:| | iceberg-spark / 1.10.0 / 3.4.3 | 54m | 50m | | iceberg-spark / 1.10.0 / 3.5.8 | 54m | 49m | | iceberg-spark / 1.9.1 / 3.4.3 | 50m | 46m | | iceberg-spark-extensions / 1.10.0 / 3.4.3 | 46m | 42m | | iceberg-spark-runtime / * (6 jobs) | 8-10m | ~4m (compiles test jars only, runs no suite) | Top 10 slowest test classes (summed timestamp-delta proxy): | Total | Tests | Test Class | Job | |--:|--:|---|---| | 628s | 159 | TestRewriteDataFilesAction | iceberg-spark / 1.10.0 / 3.4.3 | | 619s | 159 | TestRewriteDataFilesAction | iceberg-spark / 1.10.0 / 3.5.8 | | 379s | 518 | TestMergeOnReadMerge | iceberg-spark-extensions / 1.10.0 / 3.4.3 | | 376s | 108 | TestStructuredStreamingRead3 | iceberg-spark / 1.10.0 / 3.4.3 | | 364s | 553 | TestMergeOnReadMerge | iceberg-spark-extensions / 1.10.0 / 3.5.8 | | 351s | 511 | TestCopyOnWriteMerge | iceberg-spark-extensions / 1.10.0 / 3.4.3 | | 317s | 546 | TestCopyOnWriteMerge | iceberg-spark-extensions / 1.10.0 / 3.5.8 | | 285s | 329 | TestMergeOnReadDelete | iceberg-spark-extensions / 1.10.0 / 3.4.3 | | 252s | 108 | TestStructuredStreamingRead3 | iceberg-spark / 1.10.0 / 3.5.8 | | 247s | 56 | TestStoragePartitionedJoins | iceberg-spark / 1.10.0 / 3.5.8 | Only the 4 `iceberg-1.10.0` jobs emit per-test `PASSED` lines; the 1.8.1/1.9.1 jobs run the same suites silently for 35-46m. `TestRewriteDataFilesAction` (data-file compaction) is the single costliest class. The 6 `iceberg-spark-runtime` jobs only compile test jars and run no test suite at all. ### PR Build (Linux) (52m) Comet's own test suite across {Spark 3.4, 3.5, 4.0, 4.1, 4.2} over 7 categories, plus Rust tests and TPC verification (48 jobs). Slowest jobs: | Job | Tests | Wall | |---|--:|--:| | Spark 4.1 [expressions] | 857 | 22m 36s | | Spark 4.2 [expressions] | 853 | 22m 32s | | Spark 4.0 [expressions] | 857 | 22m 08s | | Spark 3.4 [expressions] | 851 | 21m 13s | | ubuntu-latest / rust-test | 546 | 20m 18s | | Spark 3.5 [expressions] | 856 | 20m 17s | | Spark 4.2 [exec] | 439 | 19m 37s | | Spark 4.2 [fuzz] | 144 | 18m 22s | Top 12 slowest individual tests: | Test | Dur | Suite | Job | |---|--:|---|---| | Broadcast hash join build-side batch coalescing | 182s | CometJoinSuite | Spark 3.5 [exec] | | Broadcast hash join build-side batch coalescing | 178s | CometJoinSuite | Spark 4.1 [exec] | | Broadcast hash join build-side batch coalescing | 177s | CometJoinSuite | Spark 4.2 [exec] | | Broadcast hash join build-side batch coalescing | 173s | CometJoinSuite | Spark 4.0 [exec] | | Broadcast hash join build-side batch coalescing | 156s | CometJoinSuite | Spark 3.4 [exec] | | all types, with nulls | 154s | CometAggregateSuite | Spark 4.2 [exec] | | all types, with nulls | 143s | CometAggregateSuite | Spark 4.0 [exec] | | all types, with nulls | 141s | CometAggregateSuite | Spark 4.1 [exec] | | all types, with nulls | 128s | CometAggregateSuite | Spark 3.5 [exec] | | Iceberg temporal types written as INT96 | 121s | CometFuzzIcebergSuite | Spark 3.4 [fuzz] | | Iceberg temporal types written as TIMESTAMP_MICROS | 115s | CometFuzzIcebergSuite | Spark 3.4 [fuzz] | | Iceberg temporal types written as TIMESTAMP_MILLIS | 110s | CometFuzzIcebergSuite | Spark 3.4 [fuzz] | Per-category profile: | Category | #tests | mean | max | p95 | |---|--:|--:|--:|--:| | fuzz | 720 | 4.67s | 121s | 10.4s | | shuffle | 1243 | 2.67s | 39s | 11.1s | | exec | 2201 | 1.68s | 182s | 2.7s | | parquet | 969 | 1.42s | 45s | 2.9s | | expressions | 4274 | 1.12s | 64s | 3.7s | `fuzz` has the slowest tests on average (heavy randomized data generation). `exec` holds the worst single outliers (join/aggregate) but a low median. `expressions` runs the most tests and the longest jobs, but each test is cheap, so duration is sheer volume. Rust tests: the 546 nextest tests execute in ~6.3s total; the slowest three are `shuffle_writer` spill/coalescing tests (5.2s, 3.9s, 1.8s). The 20m job is almost entirely Rust compilation. TPC verification: TPC-DS q72 is the standout query at ~66s under the hash join strategy versus ~26s under broadcast. ### PR Build (macOS) (52m) Same Comet test suite as Linux, restricted to {Spark 4.0, 4.1, 4.2} on `macos-14` (23 jobs). Slowest jobs: | Job | Tests | Wall | |---|--:|--:| | Spark 4.2 [fuzz] | 144 | 30m 42s | | Spark 4.0 [expressions] | 849 | 28m 06s | | Spark 4.2 [expressions] | 845 | 22m 36s | | Spark 4.2 [shuffle] | 265 | 20m 36s | | Spark 4.2 [exec] | 439 | 18m 42s | | Spark 4.0 [exec] | 441 | 18m 00s | Top 10 slowest individual tests: | Test | Dur | Suite | Job | |---|--:|---|---| | Broadcast hash join build-side batch coalescing | 210s | CometJoinSuite | Spark 4.2 [exec] | | Broadcast hash join build-side batch coalescing | 205s | CometJoinSuite | Spark 4.0 [exec] | | Broadcast hash join build-side batch coalescing | 197s | CometJoinSuite | Spark 4.1 [exec] | | Iceberg temporal types written as INT96 | 174s | CometFuzzIcebergSuite | Spark 4.2 [fuzz] | | Iceberg temporal types written as TIMESTAMP_MICROS | 168s | CometFuzzIcebergSuite | Spark 4.2 [fuzz] | | Iceberg temporal types written as TIMESTAMP_MILLIS | 160s | CometFuzzIcebergSuite | Spark 4.2 [fuzz] | | all types, with nulls | 128s | CometAggregateSuite | Spark 4.2 [exec] | | all types, with nulls | 122s | CometAggregateSuite | Spark 4.0 [exec] | | all types, with nulls | 101s | CometAggregateSuite | Spark 4.1 [exec] | | Iceberg temporal types written as TIMESTAMP_MILLIS | 99s | CometFuzzIcebergSuite | Spark 4.1 [fuzz] | The same critical tests as Linux run slower on macOS runners: the broadcast-join test hits ~210s versus ~180s on Linux, and the Iceberg temporal fuzz tests hit ~160-174s, making `Spark 4.2 [fuzz]` the single longest job in the workflow. </details> *Analysis based on the GitHub Actions job logs for the commit, parsed for per-test ScalaTest, nextest, and Gradle timing.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
