Hi all,
最近在尝试加速 Windows Unit-Test CI 的过程中(PR #17693,已关闭),我发现 root pom.xml 里有两段失效的
surefire <execution> 配置。它们在 master 上从来没真正跑过任何测试,但会让任何想用 surefire CLI 参数(例如
-Dsurefire.includesFile= 或 -Dtest=)做分片/选测的优化默默踩坑。PR #17696 把它们删掉。
While exploring how to speed up the Windows Unit-Test CI (PR #17693, now
closed), I noticed two dead surefire <execution> blocks in the root
pom.xml. On master they have never actually executed any tests, but they
silently turn into a trap for anyone trying to optimize via surefire CLI
flags such as -Dsurefire.includesFile= or -Dtest=. PR #17696 removes them.
问题在哪 / What's wrong
这两段 execution(id 是 unit-tests 和 integration-tests)的 <includes> 写的是
src/test/**/*Test.java 这种带源码路径前缀的 pattern。但 surefire 默认是扫
target/test-classes/ 下的编译产物,把类文件路径相对化后再匹配 <includes>——而那些相对路径从来不会以
src/test/ 开头,所以这两段 execution 永远 match 不到任何东西。
The two executions (unit-tests and integration-tests) declare an <includes>
pattern prefixed by src/test/:
<includes><include>src/test/**/*Test.java</include></includes>
Surefire matches <includes> against paths under
${project.build.testOutputDirectory} (i.e. target/test-classes/). Those
paths never start with src/test/, so both executions match zero test
classes on every build.
实锤证据 / Evidence
来自 master 最近一次成功的 Unit-Test run(job 76358113571,Windows datanode):
From the latest successful master Unit-Test run (job 76358113571, Windows
datanode):
┌───────────────────────────────────────────────────┬──────────┬───────────┐
│ Plugin execution │ Duration │ Tests run │
├───────────────────────────────────────────────────┼──────────┼───────────┤
│ surefire:test (default-test) │ 49 min │ 3629 │
├───────────────────────────────────────────────────┼──────────┼───────────┤
│ surefire:test (unit-tests) │ ~1 sec │ 0 │
├───────────────────────────────────────────────────┼──────────┼───────────┤
│ failsafe:integration-test (run-integration-tests) │ ~4 sec │ 1 │
├───────────────────────────────────────────────────┼──────────┼───────────┤
│ surefire:test (integration-tests) │ ~200 ms │ 0 │
└───────────────────────────────────────────────────┴──────────┴───────────┘
为什么值得删 / Why it's worth removing
虽然删除不会让 master 本身加速(dead code 本来就 0 测试),但有两个理由:
1. 它是个陷阱。 CLI 上传 -Dsurefire.includesFile=... 时会替换掉 pom 里坏的
<includes>,导致原本沉默的两个 execution 复活——结果同一组测试被跑 3 遍。PR #17693 恰恰就踩到了这个坑。
2. 它本就该删。 plugin 上方的 TODO 注释自己就承认 "Integration-Tests should be executed by
the failsafe plugin",而 datanode 和 integration-test 这两个有 *IT.java 的模块都已经显式配了
failsafe。
Removing these doesn't speed up master (the dead code already runs zero
tests), but two reasons make it worthwhile:
1. It's a trap. When a CLI flag such as -Dsurefire.includesFile=... is
supplied, it replaces the broken <includes> with a valid pattern, suddenly
activating the previously-dormant executions and running the chosen tests
three times. PR #17693 hit exactly this.
2. It's already intended to be removed. The TODO comment above the plugin
acknowledges "Integration-Tests should be executed by the failsafe plugin",
and both modules containing *IT.java (iotdb-core/datanode/ and
integration-test/) now configure maven-failsafe-plugin explicitly.
回归检查 / Regression check
- 含 *IT.java 的模块只有两个(iotdb-core/datanode/ 和 integration-test/),两者都显式配了
failsafe,IT 覆盖不变。
- 没有任何子模块 pom 覆盖这两个 execution。
- 当前 master 上没有任何 CI workflow 用了会激活 dead code 的 surefire 选测 flag。
- with-code-coverage profile 只覆盖了 surefire 的 <argLine>,不依赖这两个 execution 的存在。
- Only iotdb-core/datanode/ and integration-test/ contain *IT.java; both
already configure failsafe, so IT coverage is unchanged.
- No module-level pom redeclares either execution.
- No CI workflow on master currently passes a surefire test-selector flag.
- The with-code-coverage profile only overrides the <argLine>; it does not
depend on these executions existing.
PR: https://github.com/apache/iotdb/pull/17696
欢迎 review 和讨论。/ Reviews and feedback welcome.
Best regards,
Yuan Tian