The validates runner test parallelism is controlled here and is currently set to be "unlimited": https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
Each test fork is run on a different gradle worker, so the number of parallel test runs is limited to the max number of workers configured which is controlled here: https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50 It is currently configured to 3 * number of CPU cores. We are already running up to 48 Dataflow jobs in parallel. On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez <rfern...@google.com> wrote: > - How many resources to ValidatesRunner tests use? > - Where are those settings? > > On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax <re...@google.com> wrote: > >> The specific issue only affects Dataflow ValidatesRunner tests. We >> currently allow only one of these to run at a time, to control usage of >> Dataflow and of GCE quota. Other types of tests do not suffer from this >> issue. >> >> I would like to see if it's possible to increase Dataflow quota so we can >> run more of these in parallel. It took me 8 hours end to end to run these >> tests (about 6 hours for the run to be scheduled). If there was a failure, >> I would have had to repeat the whole process. In the worst case, this >> process could have taken me days. While this is not as pressing as some >> other issues (as most people don't need to run the Dataflow tests on every >> PR), fixing it would make such changes much easier to manage. >> >> Reuven >> >> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <rfern...@google.com> >> wrote: >> >>> +Reuven Lax <re...@google.com> told me yesterday that he was waiting >>> for some test to be scheduled and run, and it took 6 hours or so. I would >>> like to help reduce these wait times by increasing parallelism. I need help >>> understanding the continuous minimum of what we use. It seems the following >>> is true: >>> >>> >>> - There seems to always be 16 jenkins machines on (16 CPUs each) >>> - There seems to be three GKE machines always on (1 CPU each) >>> - Most (if not all) unit tests run on 1 machine, and seem to run >>> one-at-a-time <-- I think we can safely parallelize this to 20. >>> >>> With current quotas, if we parallelize to 20 concurrent unit tests, we >>> still have room for 80 other concurrent dataflow jobs to execute, with 75% >>> of CPU capacity. >>> >>> Thoughts? Additional data? >>> >>> Thanks, >>> r >>> >>