Re: ValidatesRunner test cleanup

Kenneth Knowles Thu, 03 May 2018 14:43:51 -0700

I think actually that the runner should have such an IT, not the core SDK.

On Thu, May 3, 2018 at 11:20 AM Eugene Kirpichov <[email protected]>
wrote:


> Thanks Kenn! Note though that we should have VR tests for transforms that
> have a runner specific override, such as TextIO.write() and Create that you
> mentioned.
>
> Agreed that it'd be good to have a more clear packaging separation between
> the two.
>
> On Thu, May 3, 2018, 10:35 AM Kenneth Knowles <[email protected]> wrote:
>
>> Since I went over the PR and dropped a lot of random opinions about what
>> should be VR versus NR, I'll answer too:
>>
>> VR - all primitives: ParDo, GroupByKey, Flatten.pCollections
>> (Flatten.iterables is an unrelated composite), Metrics
>> VR - critical special composites: Combine
>> VR - test infrastructure that ensures tests aren't vacuous: PAssert
>> NR - everything else in the core SDK that needs a runner but is really
>> only testing the transform, not the runner, notably Create, TextIO,
>> extended bits of Combine
>> (nothing) - everything in modules that depend on the core SDK can use
>> TestPipeline without an annotation; personally I think NR makes sense to
>> annotate these, but it has no effect
>>
>> And it is a good time to mention that it might be very cool for someone
>> to take on the task of conceiving of a more independent runner validation
>> suite. This framework is clever, but a bit deceptive as runner tests look
>> like unit tests of the primitives.
>>
>> Kenn
>>
>> On Thu, May 3, 2018 at 9:24 AM Eugene Kirpichov <[email protected]>
>> wrote:
>>
>>> Thanks Scott, this is awesome!
>>> However, we should be careful when choosing what should be
>>> ValidatesRunner and what should be NeedsRunner.
>>> Could you briefly describe how you made the call and roughly what are
>>> the statistics before/after your PR (number of tests in both categories)?
>>>
>>> On Thu, May 3, 2018 at 9:18 AM Jean-Baptiste Onofré <[email protected]>
>>> wrote:
>>>
>>>> Thanks for the update Scott. That's really a great job.
>>>>
>>>> I will ping you on slack about some points as I'm preparing the build
>>>> for the release (and I have some issues 😁).
>>>>
>>>> Thanks again
>>>> Regards
>>>> JB
>>>> Le 3 mai 2018, à 17:54, Scott Wegner <[email protected]> a écrit:
>>>>>
>>>>> Note: if you don't care about Java runner tests, you can stop reading
>>>>> now.
>>>>>
>>>>> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1]
>>>>> and converted many to @NeedsRunner in order to reduce post-commit runtime.
>>>>>
>>>>> This is work that was long overdue and finally got my attention due to
>>>>> the Gradle migration. As context, @ValidatesRunner [2] tests construct a
>>>>> TestPipeline and exercise runner behavior through SDK constructs. The 
>>>>> tests
>>>>> are written runner-agnostic so that they can be run on and validate all
>>>>> supported runners.
>>>>>
>>>>> The framework for these tests is great and writing them is super-easy.
>>>>> But as a result, we have way too many of them-- over 250. These tests run
>>>>> against all runners, and even when parallelized we see Dataflow 
>>>>> post-commit
>>>>> times exceeding 3-5 hours [3].
>>>>>
>>>>> When reading through these tests, we found many of them don't actually
>>>>> exercise runner-specific behavior, and were simply using the TestPipeline
>>>>> framework to validate SDK components. This is a valid pattern, but tests
>>>>> should be annotated with @NeedsRunner instead. With this annotation, the
>>>>> tests will run on only a single runner, currently DirectRunner.
>>>>>
>>>>> So, PR/5218 looks at all existing @ValidatesRunner tests and
>>>>> conservatively converts tests which don't need to validate all runners 
>>>>> into
>>>>> @NeedsRunner. I've also sharded out some very large test classes into
>>>>> scenario-based sub-classes. This is because Gradle parallelizes tests at
>>>>> the class-level, and we found a couple very large test classes (ParDoTest)
>>>>> became stragglers for the entire execution. Hopefully Gradle will soon
>>>>> implement dynamic splitting :)
>>>>>
>>>>> So, the action I'd like to request from others:
>>>>> 1) If you are an author of @ValidatesRunner tests, feel free to look
>>>>> over the PR and let me know if I missed anything. Kenn Knowles is also
>>>>> helping out here.
>>>>> 2) If you find yourself writing new @ValidatesRunner tests, please
>>>>> consider whether your test is validating runner-provided behavior. If not,
>>>>> use @NeedsRunner instead.
>>>>>
>>>>>
>>>>> [1] https://github.com/apache/beam/pull/5218
>>>>> [2]
>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java
>>>>>
>>>>> [3]
>>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend
>>>>>
>>>>>
>>>>

Re: ValidatesRunner test cleanup

Reply via email to