Thanks for the feedback. For methodology, I crudely went through existing
tests and looked at whether they exercise runner behavior or not. When I
wasn't sure, I opted to leave them as-is. And then I leaned on Kenn's
expertise to help categorize further :)

For the current state: here's a run of the Flink ValidatesRunner tests with
my change [1] and without [2]. We reduced the number of tests from 581 to
267 (54% reduction), and runtime from 17m to 6m22s (63% reduction). Note
that Flink runs the tests for Batch and Streaming variants. Dataflow will
have similar delta percentage-wise, but I don't have numbers yet as the
tests are currently not stable.

[1]
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/314/
[2]
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/315/

On Thu, May 3, 2018 at 2:43 PM Kenneth Knowles <[email protected]> wrote:

> I think actually that the runner should have such an IT, not the core SDK.
>
> On Thu, May 3, 2018 at 11:20 AM Eugene Kirpichov <[email protected]>
> wrote:
>
>> Thanks Kenn! Note though that we should have VR tests for transforms that
>> have a runner specific override, such as TextIO.write() and Create that you
>> mentioned.
>>
>> Agreed that it'd be good to have a more clear packaging separation
>> between the two.
>>
>> On Thu, May 3, 2018, 10:35 AM Kenneth Knowles <[email protected]> wrote:
>>
>>> Since I went over the PR and dropped a lot of random opinions about what
>>> should be VR versus NR, I'll answer too:
>>>
>>> VR - all primitives: ParDo, GroupByKey, Flatten.pCollections
>>> (Flatten.iterables is an unrelated composite), Metrics
>>> VR - critical special composites: Combine
>>> VR - test infrastructure that ensures tests aren't vacuous: PAssert
>>> NR - everything else in the core SDK that needs a runner but is really
>>> only testing the transform, not the runner, notably Create, TextIO,
>>> extended bits of Combine
>>> (nothing) - everything in modules that depend on the core SDK can use
>>> TestPipeline without an annotation; personally I think NR makes sense to
>>> annotate these, but it has no effect
>>>
>>> And it is a good time to mention that it might be very cool for someone
>>> to take on the task of conceiving of a more independent runner validation
>>> suite. This framework is clever, but a bit deceptive as runner tests look
>>> like unit tests of the primitives.
>>>
>>> Kenn
>>>
>>> On Thu, May 3, 2018 at 9:24 AM Eugene Kirpichov <[email protected]>
>>> wrote:
>>>
>>>> Thanks Scott, this is awesome!
>>>> However, we should be careful when choosing what should be
>>>> ValidatesRunner and what should be NeedsRunner.
>>>> Could you briefly describe how you made the call and roughly what are
>>>> the statistics before/after your PR (number of tests in both categories)?
>>>>
>>>> On Thu, May 3, 2018 at 9:18 AM Jean-Baptiste Onofré <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the update Scott. That's really a great job.
>>>>>
>>>>> I will ping you on slack about some points as I'm preparing the build
>>>>> for the release (and I have some issues 😁).
>>>>>
>>>>> Thanks again
>>>>> Regards
>>>>> JB
>>>>> Le 3 mai 2018, à 17:54, Scott Wegner <[email protected]> a écrit:
>>>>>>
>>>>>> Note: if you don't care about Java runner tests, you can stop reading
>>>>>> now.
>>>>>>
>>>>>> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218
>>>>>> [1] and converted many to @NeedsRunner in order to reduce post-commit
>>>>>> runtime.
>>>>>>
>>>>>> This is work that was long overdue and finally got my attention due
>>>>>> to the Gradle migration. As context, @ValidatesRunner [2] tests 
>>>>>> construct a
>>>>>> TestPipeline and exercise runner behavior through SDK constructs. The 
>>>>>> tests
>>>>>> are written runner-agnostic so that they can be run on and validate all
>>>>>> supported runners.
>>>>>>
>>>>>> The framework for these tests is great and writing them is
>>>>>> super-easy. But as a result, we have way too many of them-- over 250. 
>>>>>> These
>>>>>> tests run against all runners, and even when parallelized we see Dataflow
>>>>>> post-commit times exceeding 3-5 hours [3].
>>>>>>
>>>>>> When reading through these tests, we found many of them don't
>>>>>> actually exercise runner-specific behavior, and were simply using the
>>>>>> TestPipeline framework to validate SDK components. This is a valid 
>>>>>> pattern,
>>>>>> but tests should be annotated with @NeedsRunner instead. With this
>>>>>> annotation, the tests will run on only a single runner, currently
>>>>>> DirectRunner.
>>>>>>
>>>>>> So, PR/5218 looks at all existing @ValidatesRunner tests and
>>>>>> conservatively converts tests which don't need to validate all runners 
>>>>>> into
>>>>>> @NeedsRunner. I've also sharded out some very large test classes into
>>>>>> scenario-based sub-classes. This is because Gradle parallelizes tests at
>>>>>> the class-level, and we found a couple very large test classes 
>>>>>> (ParDoTest)
>>>>>> became stragglers for the entire execution. Hopefully Gradle will soon
>>>>>> implement dynamic splitting :)
>>>>>>
>>>>>> So, the action I'd like to request from others:
>>>>>> 1) If you are an author of @ValidatesRunner tests, feel free to look
>>>>>> over the PR and let me know if I missed anything. Kenn Knowles is also
>>>>>> helping out here.
>>>>>> 2) If you find yourself writing new @ValidatesRunner tests, please
>>>>>> consider whether your test is validating runner-provided behavior. If 
>>>>>> not,
>>>>>> use @NeedsRunner instead.
>>>>>>
>>>>>>
>>>>>> [1] https://github.com/apache/beam/pull/5218
>>>>>> [2]
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java
>>>>>>
>>>>>> [3]
>>>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend
>>>>>>
>>>>>>
>>>>>

Reply via email to