Note: if you don't care about Java runner tests, you can stop reading now.

tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1] and
converted many to @NeedsRunner in order to reduce post-commit runtime.

This is work that was long overdue and finally got my attention due to the
Gradle migration. As context, @ValidatesRunner [2] tests construct a
TestPipeline and exercise runner behavior through SDK constructs. The tests
are written runner-agnostic so that they can be run on and validate all
supported runners.

The framework for these tests is great and writing them is super-easy. But
as a result, we have way too many of them-- over 250. These tests run
against all runners, and even when parallelized we see Dataflow post-commit
times exceeding 3-5 hours [3].

When reading through these tests, we found many of them don't actually
exercise runner-specific behavior, and were simply using the TestPipeline
framework to validate SDK components. This is a valid pattern, but tests
should be annotated with @NeedsRunner instead. With this annotation, the
tests will run on only a single runner, currently DirectRunner.

So, PR/5218 looks at all existing @ValidatesRunner tests and conservatively
converts tests which don't need to validate all runners into @NeedsRunner.
I've also sharded out some very large test classes into scenario-based
sub-classes. This is because Gradle parallelizes tests at the class-level,
and we found a couple very large test classes (ParDoTest) became stragglers
for the entire execution. Hopefully Gradle will soon implement dynamic
splitting :)

So, the action I'd like to request from others:
1) If you are an author of @ValidatesRunner tests, feel free to look over
the PR and let me know if I missed anything. Kenn Knowles is also helping
out here.
2) If you find yourself writing new @ValidatesRunner tests, please consider
whether your test is validating runner-provided behavior. If not, use
@NeedsRunner instead.


[1] https://github.com/apache/beam/pull/5218
[2]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java

[3]
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend

Reply via email to