This is very useful Scott. Thanks a lot. As a note, I suspect you meant https://github.com/apache/beam/pull/5193, instead of 5218. -P.
On Thu, May 3, 2018 at 9:07 AM Reuven Lax <[email protected]> wrote: > I suspect that at least some of these are because people copy/pasted other > tests, not realizing the overhead of ValidatesRunner. Is this something we > should document in the contributors guide? > > On Thu, May 3, 2018 at 8:54 AM Scott Wegner <[email protected]> wrote: > >> Note: if you don't care about Java runner tests, you can stop reading now. >> >> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1] >> and converted many to @NeedsRunner in order to reduce post-commit runtime. >> >> This is work that was long overdue and finally got my attention due to >> the Gradle migration. As context, @ValidatesRunner [2] tests construct a >> TestPipeline and exercise runner behavior through SDK constructs. The tests >> are written runner-agnostic so that they can be run on and validate all >> supported runners. >> >> The framework for these tests is great and writing them is super-easy. >> But as a result, we have way too many of them-- over 250. These tests run >> against all runners, and even when parallelized we see Dataflow post-commit >> times exceeding 3-5 hours [3]. >> >> When reading through these tests, we found many of them don't actually >> exercise runner-specific behavior, and were simply using the TestPipeline >> framework to validate SDK components. This is a valid pattern, but tests >> should be annotated with @NeedsRunner instead. With this annotation, the >> tests will run on only a single runner, currently DirectRunner. >> >> So, PR/5218 looks at all existing @ValidatesRunner tests and >> conservatively converts tests which don't need to validate all runners into >> @NeedsRunner. I've also sharded out some very large test classes into >> scenario-based sub-classes. This is because Gradle parallelizes tests at >> the class-level, and we found a couple very large test classes (ParDoTest) >> became stragglers for the entire execution. Hopefully Gradle will soon >> implement dynamic splitting :) >> >> So, the action I'd like to request from others: >> 1) If you are an author of @ValidatesRunner tests, feel free to look over >> the PR and let me know if I missed anything. Kenn Knowles is also helping >> out here. >> 2) If you find yourself writing new @ValidatesRunner tests, please >> consider whether your test is validating runner-provided behavior. If not, >> use @NeedsRunner instead. >> >> >> [1] https://github.com/apache/beam/pull/5218 >> [2] >> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java >> >> [3] >> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend >> >> > -- Got feedback? go/pabloem-feedback
