[CI] Multi-Cluster IT speedup via dual-cluster job sharding (PR #17695) / Multi-Cluster IT 通过双集群 job 分片加速

Yuan Tian Sun, 17 May 2026 00:02:56 -0700

Hi all,

刚开了一个 PR 来加速 Multi-Cluster IT 流水线，在此把背景、做法和实测结果
同步给社区，欢迎 review：https://github.com/apache/iotdb/pull/17695


I just opened a PR to speed up the Multi-Cluster IT pipeline. Sharing
the background, approach, and measured results below — reviews
welcome: https://github.com/apache/iotdb/pull/17695

== 背景 / Background ==

`Multi-Cluster IT` 每个 PR 跑 11 个并行 job。其中 5 个 dual-cluster job
都用 HighPerformanceMode（每个测试要起 2 个集群 × 4 节点 = 8 节点），单 job
最长 ~63 分钟，几乎独占整个 workflow 的 wall clock。其余 6 个 job 都在
~5-8 分钟内完成。

`Multi-Cluster IT` runs 11 parallel jobs per PR. Five of them are
dual-cluster jobs on HighPerformanceMode (2 clusters × 4 nodes = 8
nodes per test) — the longest one took ~63 min and almost single-
handedly dictated the workflow's wall clock. The other 6 jobs all
finished in ~5-8 min.

== 做法 / Approach ==

复用了之前 PR #17692 在 cluster-it-1c1d.yml 引入的分片模式：给这 5 个
dual job 各加一个 `shard: [0, 1, 2]` 矩阵维度，按 hash-mod 把 IT 类列表
分到 3 个并行 shard，写到 `$RUNNER_TEMP/it-shard.txt` 后通过
`-Dfailsafe.includesFile` 传给 failsafe。

Reused the sharding pattern that PR #17692 introduced in
cluster-it-1c1d.yml: added a `shard: [0, 1, 2]` matrix dimension to
each of the 5 dual jobs, hash-mod'd the IT class list into 3 parallel
shards, wrote each shard's list to `$RUNNER_TEMP/it-shard.txt`, and
passed it to failsafe via `-Dfailsafe.includesFile`.

只改了 `.github/workflows/pipe-it.yml`，没有动任何测试代码（+110 / -5 行）。

Only `.github/workflows/pipe-it.yml` was touched, no test code
changed (+110 / -5 lines).

== 实测结果 / Measured results ==

整个 Multi-Cluster IT workflow wall clock：~63 分钟 → ~33 分钟
（约 1.9× 加速，每个 PR 省 ~30 分钟）。

Multi-Cluster IT workflow wall clock: ~63 min → ~33 min (~1.9×
speedup, ~30 min saved per PR).

各 job 实测对比 / Per-job measurements:

  Job                          Before   After   Speedup
  --------------------------   ------   -----   -------
  dual-table-manual-basic      ~63 min  ~33 min  1.9×
  dual-table-manual-enhanced   ~62 min  ~31 min  2.0×
  dual-tree-auto-enhanced      ~51 min  ~33 min  1.5×
  dual-tree-auto-basic         ~42 min  ~25 min  1.7×
  dual-tree-manual             ~27 min  ~15 min  1.8×

5 个 job 的 15 个 shard 全部 pass，没有触发 RAT 的 "Files with unapproved
licenses" 警告（shard 文件写在 `$RUNNER_TEMP` 下，仓库外），各 shard 的
类数与本地预演一致（4/4/4、3/3/3、3/4/4、4/5/4、3/4/4）。

All 15 shards across the 5 jobs passed. No `Files with unapproved
licenses` warning from RAT (the shard file lives under `$RUNNER_TEMP`,
outside the repo). Per-shard class counts on CI matched the local
preview exactly (4/4/4, 3/3/3, 3/4/4, 4/5/4, 3/4/4).

== 关于实际加速不到 3× 的说明 / Note on actual vs. theoretical speedup ==

理论上 3 个并行 shard 应该带来 ~3× 加速，但实测只有 ~1.9×。原因是按类名
字母序 hash-mod 不能按类的耗时做均衡——某个 shard 总会拿到那些重量级的类。
例如 `dual-tree-auto-basic` 的 shard 0 跑了 25 分钟，shard 2 只跑了 9 分钟，
而 wall clock 由最慢的 shard 决定。

A 3-shard split should ideally give a ~3× speedup, but we measured
~1.9×. The reason is that alphabetical hash-mod doesn't balance by
per-class cost — some shard always lands the heavyweight classes. For
example, `dual-tree-auto-basic` shard 0 took 25 min while shard 2 took
only 9 min, and the wall clock is bounded by the slowest shard.

要想进一步压缩，后续可以考虑：
  - 按历史耗时对类做加权 bin-pack（而不是按类名 hash）
  - 增加 shard 数量（4 或 5）以降低方差
  - 在每个 shard 内做集群复用（参考 PR #17687 的 AINodeSharedClusterIT 思路）

To squeeze further, follow-ups could:
  - cost-weighted bin-packing by historical per-class duration (instead
    of name-based hash)
  - more shards per job (4 or 5) to reduce variance
  - cluster reuse within a shard (cf. AINodeSharedClusterIT from
    PR #17687)

但这些都会显著增加 runner 占用或测试代码改动，本 PR 先用最低风险的方式
拿到 ~2× 加速。

But each of those significantly increases runner usage or requires
test-code changes, so this PR takes the lowest-risk path for a ~2×
win.

欢迎 review。

Reviews welcome.

Best regards,
----------------
Yuan Tian

[CI] Multi-Cluster IT speedup via dual-cluster job sharding (PR #17695) / Multi-Cluster IT 通过双集群 job 分片加速

Reply via email to