Scott Wegner created BEAM-4100:
----------------------------------
Summary: Dataflow ValidatesRunner PostCommits are nearing the 3
hour max runtime
Key: BEAM-4100
URL: https://issues.apache.org/jira/browse/BEAM-4100
Project: Beam
Issue Type: Bug
Components: runner-dataflow
Reporter: Scott Wegner
The beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle test suite been
getting slower and slower over time. We run over 250 pipelines, and Dataflow
has a fixed cost of about 3 minutes per pipeline just to spin up VM's. And with
Gradle we are not able to parallelize execution quite as well, putting the test
suite very close to the 3 hour limit.
We should take steps to decrease the overall execution time for these
post-commits. Some ideas, roughly in order of recommended action:
# Convert some ValidatesRunner tests to NeedsRunner. ValidatesRunner should be
used when the test needs to validate functionality of each runner. I suspect
that many of these are either duplicate or only need to be run on a single
runner.
# Break up large test classes. Gradle parallelizes test classes, so the
overall execution is constrained by the slowest test classes. ParDoTest for
example runs over 50 pipelines, so the overall execution will always be > 50 *
3min = 2h30m
# Investigate how to achieve additional ValidatesRunner test parallelism. For
example, we could:
## Build a custom JUnit runner which packs a set of pipeline graphs into a
single job to execute together.
## Work with Dataflow for supporting a way to reuse VMs between jobs to
decrease the overall cost.
## Work on the Gradle Java plugin to support parallelization at the test case
level, similar to Maven Surefire.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)