The validates runner test parallelism is controlled here and is currently
set to be "unlimited":
https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115

Each test fork is run on a different gradle worker, so the number of
parallel test runs is limited to the max number of workers configured which
is controlled here:
https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
It is currently configured to 3 * number of CPU cores.

We are already running up to 48 Dataflow jobs in parallel.


On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez <rfern...@google.com>
wrote:

> - How many resources to ValidatesRunner tests use?
> - Where are those settings?
>
> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax <re...@google.com> wrote:
>
>> The specific issue only affects Dataflow ValidatesRunner tests. We
>> currently allow only one of these to run at a time, to control usage of
>> Dataflow and of GCE quota. Other types of tests do not suffer from this
>> issue.
>>
>> I would like to see if it's possible to increase Dataflow quota so we can
>> run more of these in parallel. It took me 8 hours end to end to run these
>> tests (about 6 hours for the run to be scheduled). If there was a failure,
>> I would have had to repeat the whole process. In the worst case, this
>> process could have taken me days. While this is not as pressing as some
>> other issues (as most people don't need to run the Dataflow tests on every
>> PR), fixing it would make such changes much easier to manage.
>>
>> Reuven
>>
>> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <rfern...@google.com>
>> wrote:
>>
>>> +Reuven Lax <re...@google.com> told me yesterday that he was waiting
>>> for some test to be scheduled and run, and it took 6 hours or so. I would
>>> like to help reduce these wait times by increasing parallelism. I need help
>>> understanding the continuous minimum of what we use. It seems the following
>>> is true:
>>>
>>>
>>>    - There seems to always be 16 jenkins machines on (16 CPUs each)
>>>    - There seems to be three GKE machines always on (1 CPU each)
>>>    - Most (if not all) unit tests run on 1 machine, and seem to run
>>>    one-at-a-time <-- I think we can safely parallelize this to 20.
>>>
>>> With current quotas, if we parallelize to 20 concurrent unit tests, we
>>> still have room for 80 other concurrent dataflow jobs to execute, with 75%
>>> of CPU capacity.
>>>
>>> Thoughts? Additional data?
>>>
>>> Thanks,
>>> r
>>>
>>

Reply via email to