Awesome. This is a huge benefit to Beam & the sanity of everyone working on
it.

When a class is factored to use the "enclosed" runner (or whatnot) does
that add parallelism in Gradle? If not, can we just life them to new
top-level classes?

Or, sort of bigger question, is it Dataflow quota / throttling that causes
it to be 2 hours? If so, there's not much point messing with this
already-fast-enough build. We did have some cool ideas around re-using the
TestPipeline to run multiple unit tests together in a single job, but I
think that's a sizeable project (I'd love to be proved wrong).

Kenn

On Wed, May 9, 2018 at 8:57 AM Scott Wegner <[email protected]> wrote:

> FYI, this change is now merged [1]. Along with the many many other
> improvements made during the Gradle migration, many of our Jenkins test
> suites are now significantly faster than they used to be:
>
> * Java Precommit: before ~1h30m [2], now 37m [3] (59% reduction)
> * Flink ValidatesRunner: before ~30m [4], now 7.5m [5] (75% reduction)
> * Nightly Snapshot: before ~1h [6], now ~20m [7] (67% reduction)
>
> The Dataflow ValidatesRunner test suite is back to where it was before
> [8]: just over 2 hours. [9]. This suite didn't improve significantly over
> Maven primarily due to Gradle's test parallelization. Tests are
> parallelized at the test class level whereas Maven Surefire supported
> parallelizing test cases within a class.
>
>
> [1] https://github.com/apache/beam/pull/5193
> [2]
> https://builds.apache.org/job/beam_PreCommit_Java_MavenInstall/buildTimeTrend
>
> [3]
> https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/buildTimeTrend
>
> [4]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/buildTimeTrend
>
> [5]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/buildTimeTrend
>
> [6]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/buildTimeTrend
>
> [7]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend
>
> [8]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/buildTimeTrend
>
> [9]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend
>
>
> On Fri, May 4, 2018 at 2:34 AM Etienne Chauchot <[email protected]>
> wrote:
>
>> Scott, thanks for that !
>>
>> I only quickly looked at the ValidatesRunner tests that I wrote (you
>> modified none) and the ones that impact my ongoing work (metrics).
>> I think some tests in MetricsTest still need to be ValidatesRunner tests.
>> See my comment in the review.
>>
>> Etienne
>>
>> Note: if you don't care about Java runner tests, you can stop reading now.
>>
>> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1]
>> and converted many to @NeedsRunner in order to reduce post-commit runtime.
>>
>> This is work that was long overdue and finally got my attention due to
>> the Gradle migration. As context, @ValidatesRunner [2] tests construct a
>> TestPipeline and exercise runner behavior through SDK constructs. The tests
>> are written runner-agnostic so that they can be run on and validate all
>> supported runners.
>>
>> The framework for these tests is great and writing them is super-easy.
>> But as a result, we have way too many of them-- over 250. These tests run
>> against all runners, and even when parallelized we see Dataflow post-commit
>> times exceeding 3-5 hours [3].
>>
>> When reading through these tests, we found many of them don't actually
>> exercise runner-specific behavior, and were simply using the TestPipeline
>> framework to validate SDK components. This is a valid pattern, but tests
>> should be annotated with @NeedsRunner instead. With this annotation, the
>> tests will run on only a single runner, currently DirectRunner.
>>
>> So, PR/5218 looks at all existing @ValidatesRunner tests and
>> conservatively converts tests which don't need to validate all runners into
>> @NeedsRunner. I've also sharded out some very large test classes into
>> scenario-based sub-classes. This is because Gradle parallelizes tests at
>> the class-level, and we found a couple very large test classes (ParDoTest)
>> became stragglers for the entire execution. Hopefully Gradle will soon
>> implement dynamic splitting :)
>>
>> So, the action I'd like to request from others:
>> 1) If you are an author of @ValidatesRunner tests, feel free to look over
>> the PR and let me know if I missed anything. Kenn Knowles is also helping
>> out here.
>> 2) If you find yourself writing new @ValidatesRunner tests, please
>> consider whether your test is validating runner-provided behavior. If not,
>> use @NeedsRunner instead.
>>
>>
>> [1] https://github.com/apache/beam/pull/5218
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java
>>
>> [3]
>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend
>>
>>
>>

Reply via email to