Thanks for the feedback. For methodology, I crudely went through existing tests and looked at whether they exercise runner behavior or not. When I wasn't sure, I opted to leave them as-is. And then I leaned on Kenn's expertise to help categorize further :)
For the current state: here's a run of the Flink ValidatesRunner tests with my change [1] and without [2]. We reduced the number of tests from 581 to 267 (54% reduction), and runtime from 17m to 6m22s (63% reduction). Note that Flink runs the tests for Batch and Streaming variants. Dataflow will have similar delta percentage-wise, but I don't have numbers yet as the tests are currently not stable. [1] https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/314/ [2] https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/315/ On Thu, May 3, 2018 at 2:43 PM Kenneth Knowles <[email protected]> wrote: > I think actually that the runner should have such an IT, not the core SDK. > > On Thu, May 3, 2018 at 11:20 AM Eugene Kirpichov <[email protected]> > wrote: > >> Thanks Kenn! Note though that we should have VR tests for transforms that >> have a runner specific override, such as TextIO.write() and Create that you >> mentioned. >> >> Agreed that it'd be good to have a more clear packaging separation >> between the two. >> >> On Thu, May 3, 2018, 10:35 AM Kenneth Knowles <[email protected]> wrote: >> >>> Since I went over the PR and dropped a lot of random opinions about what >>> should be VR versus NR, I'll answer too: >>> >>> VR - all primitives: ParDo, GroupByKey, Flatten.pCollections >>> (Flatten.iterables is an unrelated composite), Metrics >>> VR - critical special composites: Combine >>> VR - test infrastructure that ensures tests aren't vacuous: PAssert >>> NR - everything else in the core SDK that needs a runner but is really >>> only testing the transform, not the runner, notably Create, TextIO, >>> extended bits of Combine >>> (nothing) - everything in modules that depend on the core SDK can use >>> TestPipeline without an annotation; personally I think NR makes sense to >>> annotate these, but it has no effect >>> >>> And it is a good time to mention that it might be very cool for someone >>> to take on the task of conceiving of a more independent runner validation >>> suite. This framework is clever, but a bit deceptive as runner tests look >>> like unit tests of the primitives. >>> >>> Kenn >>> >>> On Thu, May 3, 2018 at 9:24 AM Eugene Kirpichov <[email protected]> >>> wrote: >>> >>>> Thanks Scott, this is awesome! >>>> However, we should be careful when choosing what should be >>>> ValidatesRunner and what should be NeedsRunner. >>>> Could you briefly describe how you made the call and roughly what are >>>> the statistics before/after your PR (number of tests in both categories)? >>>> >>>> On Thu, May 3, 2018 at 9:18 AM Jean-Baptiste Onofré <[email protected]> >>>> wrote: >>>> >>>>> Thanks for the update Scott. That's really a great job. >>>>> >>>>> I will ping you on slack about some points as I'm preparing the build >>>>> for the release (and I have some issues 😁). >>>>> >>>>> Thanks again >>>>> Regards >>>>> JB >>>>> Le 3 mai 2018, à 17:54, Scott Wegner <[email protected]> a écrit: >>>>>> >>>>>> Note: if you don't care about Java runner tests, you can stop reading >>>>>> now. >>>>>> >>>>>> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 >>>>>> [1] and converted many to @NeedsRunner in order to reduce post-commit >>>>>> runtime. >>>>>> >>>>>> This is work that was long overdue and finally got my attention due >>>>>> to the Gradle migration. As context, @ValidatesRunner [2] tests >>>>>> construct a >>>>>> TestPipeline and exercise runner behavior through SDK constructs. The >>>>>> tests >>>>>> are written runner-agnostic so that they can be run on and validate all >>>>>> supported runners. >>>>>> >>>>>> The framework for these tests is great and writing them is >>>>>> super-easy. But as a result, we have way too many of them-- over 250. >>>>>> These >>>>>> tests run against all runners, and even when parallelized we see Dataflow >>>>>> post-commit times exceeding 3-5 hours [3]. >>>>>> >>>>>> When reading through these tests, we found many of them don't >>>>>> actually exercise runner-specific behavior, and were simply using the >>>>>> TestPipeline framework to validate SDK components. This is a valid >>>>>> pattern, >>>>>> but tests should be annotated with @NeedsRunner instead. With this >>>>>> annotation, the tests will run on only a single runner, currently >>>>>> DirectRunner. >>>>>> >>>>>> So, PR/5218 looks at all existing @ValidatesRunner tests and >>>>>> conservatively converts tests which don't need to validate all runners >>>>>> into >>>>>> @NeedsRunner. I've also sharded out some very large test classes into >>>>>> scenario-based sub-classes. This is because Gradle parallelizes tests at >>>>>> the class-level, and we found a couple very large test classes >>>>>> (ParDoTest) >>>>>> became stragglers for the entire execution. Hopefully Gradle will soon >>>>>> implement dynamic splitting :) >>>>>> >>>>>> So, the action I'd like to request from others: >>>>>> 1) If you are an author of @ValidatesRunner tests, feel free to look >>>>>> over the PR and let me know if I missed anything. Kenn Knowles is also >>>>>> helping out here. >>>>>> 2) If you find yourself writing new @ValidatesRunner tests, please >>>>>> consider whether your test is validating runner-provided behavior. If >>>>>> not, >>>>>> use @NeedsRunner instead. >>>>>> >>>>>> >>>>>> [1] https://github.com/apache/beam/pull/5218 >>>>>> [2] >>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java >>>>>> >>>>>> [3] >>>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend >>>>>> >>>>>> >>>>>
