I think actually that the runner should have such an IT, not the core SDK. On Thu, May 3, 2018 at 11:20 AM Eugene Kirpichov <[email protected]> wrote:
> Thanks Kenn! Note though that we should have VR tests for transforms that > have a runner specific override, such as TextIO.write() and Create that you > mentioned. > > Agreed that it'd be good to have a more clear packaging separation between > the two. > > On Thu, May 3, 2018, 10:35 AM Kenneth Knowles <[email protected]> wrote: > >> Since I went over the PR and dropped a lot of random opinions about what >> should be VR versus NR, I'll answer too: >> >> VR - all primitives: ParDo, GroupByKey, Flatten.pCollections >> (Flatten.iterables is an unrelated composite), Metrics >> VR - critical special composites: Combine >> VR - test infrastructure that ensures tests aren't vacuous: PAssert >> NR - everything else in the core SDK that needs a runner but is really >> only testing the transform, not the runner, notably Create, TextIO, >> extended bits of Combine >> (nothing) - everything in modules that depend on the core SDK can use >> TestPipeline without an annotation; personally I think NR makes sense to >> annotate these, but it has no effect >> >> And it is a good time to mention that it might be very cool for someone >> to take on the task of conceiving of a more independent runner validation >> suite. This framework is clever, but a bit deceptive as runner tests look >> like unit tests of the primitives. >> >> Kenn >> >> On Thu, May 3, 2018 at 9:24 AM Eugene Kirpichov <[email protected]> >> wrote: >> >>> Thanks Scott, this is awesome! >>> However, we should be careful when choosing what should be >>> ValidatesRunner and what should be NeedsRunner. >>> Could you briefly describe how you made the call and roughly what are >>> the statistics before/after your PR (number of tests in both categories)? >>> >>> On Thu, May 3, 2018 at 9:18 AM Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>>> Thanks for the update Scott. That's really a great job. >>>> >>>> I will ping you on slack about some points as I'm preparing the build >>>> for the release (and I have some issues 😁). >>>> >>>> Thanks again >>>> Regards >>>> JB >>>> Le 3 mai 2018, à 17:54, Scott Wegner <[email protected]> a écrit: >>>>> >>>>> Note: if you don't care about Java runner tests, you can stop reading >>>>> now. >>>>> >>>>> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1] >>>>> and converted many to @NeedsRunner in order to reduce post-commit runtime. >>>>> >>>>> This is work that was long overdue and finally got my attention due to >>>>> the Gradle migration. As context, @ValidatesRunner [2] tests construct a >>>>> TestPipeline and exercise runner behavior through SDK constructs. The >>>>> tests >>>>> are written runner-agnostic so that they can be run on and validate all >>>>> supported runners. >>>>> >>>>> The framework for these tests is great and writing them is super-easy. >>>>> But as a result, we have way too many of them-- over 250. These tests run >>>>> against all runners, and even when parallelized we see Dataflow >>>>> post-commit >>>>> times exceeding 3-5 hours [3]. >>>>> >>>>> When reading through these tests, we found many of them don't actually >>>>> exercise runner-specific behavior, and were simply using the TestPipeline >>>>> framework to validate SDK components. This is a valid pattern, but tests >>>>> should be annotated with @NeedsRunner instead. With this annotation, the >>>>> tests will run on only a single runner, currently DirectRunner. >>>>> >>>>> So, PR/5218 looks at all existing @ValidatesRunner tests and >>>>> conservatively converts tests which don't need to validate all runners >>>>> into >>>>> @NeedsRunner. I've also sharded out some very large test classes into >>>>> scenario-based sub-classes. This is because Gradle parallelizes tests at >>>>> the class-level, and we found a couple very large test classes (ParDoTest) >>>>> became stragglers for the entire execution. Hopefully Gradle will soon >>>>> implement dynamic splitting :) >>>>> >>>>> So, the action I'd like to request from others: >>>>> 1) If you are an author of @ValidatesRunner tests, feel free to look >>>>> over the PR and let me know if I missed anything. Kenn Knowles is also >>>>> helping out here. >>>>> 2) If you find yourself writing new @ValidatesRunner tests, please >>>>> consider whether your test is validating runner-provided behavior. If not, >>>>> use @NeedsRunner instead. >>>>> >>>>> >>>>> [1] https://github.com/apache/beam/pull/5218 >>>>> [2] >>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java >>>>> >>>>> [3] >>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend >>>>> >>>>> >>>>
