Hi all,

刚开了一个 PR 来加速 Multi-Cluster IT 流水线,在此把背景、做法和实测结果
同步给社区,欢迎 review:https://github.com/apache/iotdb/pull/17695

I just opened a PR to speed up the Multi-Cluster IT pipeline. Sharing
the background, approach, and measured results below — reviews
welcome: https://github.com/apache/iotdb/pull/17695

== 背景 / Background ==

`Multi-Cluster IT` 每个 PR 跑 11 个并行 job。其中 5 个 dual-cluster job
都用 HighPerformanceMode(每个测试要起 2 个集群 × 4 节点 = 8 节点),单 job
最长 ~63 分钟,几乎独占整个 workflow 的 wall clock。其余 6 个 job 都在
~5-8 分钟内完成。

`Multi-Cluster IT` runs 11 parallel jobs per PR. Five of them are
dual-cluster jobs on HighPerformanceMode (2 clusters × 4 nodes = 8
nodes per test) — the longest one took ~63 min and almost single-
handedly dictated the workflow's wall clock. The other 6 jobs all
finished in ~5-8 min.

== 做法 / Approach ==

复用了之前 PR #17692 在 cluster-it-1c1d.yml 引入的分片模式:给这 5 个
dual job 各加一个 `shard: [0, 1, 2]` 矩阵维度,按 hash-mod 把 IT 类列表
分到 3 个并行 shard,写到 `$RUNNER_TEMP/it-shard.txt` 后通过
`-Dfailsafe.includesFile` 传给 failsafe。

Reused the sharding pattern that PR #17692 introduced in
cluster-it-1c1d.yml: added a `shard: [0, 1, 2]` matrix dimension to
each of the 5 dual jobs, hash-mod'd the IT class list into 3 parallel
shards, wrote each shard's list to `$RUNNER_TEMP/it-shard.txt`, and
passed it to failsafe via `-Dfailsafe.includesFile`.

只改了 `.github/workflows/pipe-it.yml`,没有动任何测试代码(+110 / -5 行)。

Only `.github/workflows/pipe-it.yml` was touched, no test code
changed (+110 / -5 lines).

== 实测结果 / Measured results ==

整个 Multi-Cluster IT workflow wall clock:~63 分钟 → ~33 分钟
(约 1.9× 加速,每个 PR 省 ~30 分钟)。

Multi-Cluster IT workflow wall clock: ~63 min → ~33 min (~1.9×
speedup, ~30 min saved per PR).

各 job 实测对比 / Per-job measurements:

  Job                          Before   After   Speedup
  --------------------------   ------   -----   -------
  dual-table-manual-basic      ~63 min  ~33 min  1.9×
  dual-table-manual-enhanced   ~62 min  ~31 min  2.0×
  dual-tree-auto-enhanced      ~51 min  ~33 min  1.5×
  dual-tree-auto-basic         ~42 min  ~25 min  1.7×
  dual-tree-manual             ~27 min  ~15 min  1.8×

5 个 job 的 15 个 shard 全部 pass,没有触发 RAT 的 "Files with unapproved
licenses" 警告(shard 文件写在 `$RUNNER_TEMP` 下,仓库外),各 shard 的
类数与本地预演一致(4/4/4、3/3/3、3/4/4、4/5/4、3/4/4)。

All 15 shards across the 5 jobs passed. No `Files with unapproved
licenses` warning from RAT (the shard file lives under `$RUNNER_TEMP`,
outside the repo). Per-shard class counts on CI matched the local
preview exactly (4/4/4, 3/3/3, 3/4/4, 4/5/4, 3/4/4).

== 关于实际加速不到 3× 的说明 / Note on actual vs. theoretical speedup ==

理论上 3 个并行 shard 应该带来 ~3× 加速,但实测只有 ~1.9×。原因是按类名
字母序 hash-mod 不能按类的耗时做均衡——某个 shard 总会拿到那些重量级的类。
例如 `dual-tree-auto-basic` 的 shard 0 跑了 25 分钟,shard 2 只跑了 9 分钟,
而 wall clock 由最慢的 shard 决定。

A 3-shard split should ideally give a ~3× speedup, but we measured
~1.9×. The reason is that alphabetical hash-mod doesn't balance by
per-class cost — some shard always lands the heavyweight classes. For
example, `dual-tree-auto-basic` shard 0 took 25 min while shard 2 took
only 9 min, and the wall clock is bounded by the slowest shard.

要想进一步压缩,后续可以考虑:
  - 按历史耗时对类做加权 bin-pack(而不是按类名 hash)
  - 增加 shard 数量(4 或 5)以降低方差
  - 在每个 shard 内做集群复用(参考 PR #17687 的 AINodeSharedClusterIT 思路)

To squeeze further, follow-ups could:
  - cost-weighted bin-packing by historical per-class duration (instead
    of name-based hash)
  - more shards per job (4 or 5) to reduce variance
  - cluster reuse within a shard (cf. AINodeSharedClusterIT from
    PR #17687)

但这些都会显著增加 runner 占用或测试代码改动,本 PR 先用最低风险的方式
拿到 ~2× 加速。

But each of those significantly increases runner usage or requires
test-code changes, so this PR takes the lowest-risk path for a ~2×
win.

欢迎 review。

Reviews welcome.

Best regards,
----------------
Yuan Tian

Reply via email to