One thing that is nice when you do this is to be able to share your
results. Though if all you are sharing is "they passed" then I guess we
don't have to insist on evidence.

Kenn

On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner <sc...@apache.org> wrote:

> A few thoughts:
>
> * The Jenkins job getting backed up
> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
> Mikhail refactored Jenkins jobs, this only runs when explicitly requested
> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job
> is idle more often than backlogged.
>
> * It's difficult to reason about our exact quota needs because Dataflow
> jobs get launched from various Jenkins jobs that have different parallelism
> configurations. If we have budget, we could enable concurrent execution of
> this job and increase our quota enough to give some breathing room. If we
> do this, I recommend limiting the max concurrency via
> throttleConcurrentBuilds [2] to some reasonable limit.
>
> * This test suite is meant to be an exhaustive post-commit validation of
> Dataflow runner, and tests a lot of different aspects of a runner. It would
> be more efficient to run locally only the tests affected by your change.
> Note that this requires having access to a GCP project with billing, but
> most Dataflow developers probably have access to this already. The command
> for this is:
>
> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner
> -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot
> --tests "org.apache.beam.MyTestClass"
>
> [1]
> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend
> [2]
> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds
>
>
> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik <lc...@google.com> wrote:
>
>> The validates runner test parallelism is controlled here and is currently
>> set to be "unlimited":
>>
>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
>>
>> Each test fork is run on a different gradle worker, so the number of
>> parallel test runs is limited to the max number of workers configured which
>> is controlled here:
>>
>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
>> It is currently configured to 3 * number of CPU cores.
>>
>> We are already running up to 48 Dataflow jobs in parallel.
>>
>>
>> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez <rfern...@google.com>
>> wrote:
>>
>>> - How many resources to ValidatesRunner tests use?
>>> - Where are those settings?
>>>
>>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> The specific issue only affects Dataflow ValidatesRunner tests. We
>>>> currently allow only one of these to run at a time, to control usage of
>>>> Dataflow and of GCE quota. Other types of tests do not suffer from this
>>>> issue.
>>>>
>>>> I would like to see if it's possible to increase Dataflow quota so we
>>>> can run more of these in parallel. It took me 8 hours end to end to run
>>>> these tests (about 6 hours for the run to be scheduled). If there was a
>>>> failure, I would have had to repeat the whole process. In the worst case,
>>>> this process could have taken me days. While this is not as pressing as
>>>> some other issues (as most people don't need to run the Dataflow tests on
>>>> every PR), fixing it would make such changes much easier to manage.
>>>>
>>>> Reuven
>>>>
>>>> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <rfern...@google.com>
>>>> wrote:
>>>>
>>>>> +Reuven Lax <re...@google.com> told me yesterday that he was waiting
>>>>> for some test to be scheduled and run, and it took 6 hours or so. I would
>>>>> like to help reduce these wait times by increasing parallelism. I need 
>>>>> help
>>>>> understanding the continuous minimum of what we use. It seems the 
>>>>> following
>>>>> is true:
>>>>>
>>>>>
>>>>>    - There seems to always be 16 jenkins machines on (16 CPUs each)
>>>>>    - There seems to be three GKE machines always on (1 CPU each)
>>>>>    - Most (if not all) unit tests run on 1 machine, and seem to run
>>>>>    one-at-a-time <-- I think we can safely parallelize this to 20.
>>>>>
>>>>> With current quotas, if we parallelize to 20 concurrent unit tests, we
>>>>> still have room for 80 other concurrent dataflow jobs to execute, with 75%
>>>>> of CPU capacity.
>>>>>
>>>>> Thoughts? Additional data?
>>>>>
>>>>> Thanks,
>>>>> r
>>>>>
>>>>

Reply via email to