One thing that is nice when you do this is to be able to share your results. Though if all you are sharing is "they passed" then I guess we don't have to insist on evidence.
Kenn On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner <sc...@apache.org> wrote: > A few thoughts: > > * The Jenkins job getting backed up > is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since > Mikhail refactored Jenkins jobs, this only runs when explicitly requested > via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job > is idle more often than backlogged. > > * It's difficult to reason about our exact quota needs because Dataflow > jobs get launched from various Jenkins jobs that have different parallelism > configurations. If we have budget, we could enable concurrent execution of > this job and increase our quota enough to give some breathing room. If we > do this, I recommend limiting the max concurrency via > throttleConcurrentBuilds [2] to some reasonable limit. > > * This test suite is meant to be an exhaustive post-commit validation of > Dataflow runner, and tests a lot of different aspects of a runner. It would > be more efficient to run locally only the tests affected by your change. > Note that this requires having access to a GCP project with billing, but > most Dataflow developers probably have access to this already. The command > for this is: > > ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner > -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot > --tests "org.apache.beam.MyTestClass" > > [1] > https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend > [2] > https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds > > > On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik <lc...@google.com> wrote: > >> The validates runner test parallelism is controlled here and is currently >> set to be "unlimited": >> >> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115 >> >> Each test fork is run on a different gradle worker, so the number of >> parallel test runs is limited to the max number of workers configured which >> is controlled here: >> >> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50 >> It is currently configured to 3 * number of CPU cores. >> >> We are already running up to 48 Dataflow jobs in parallel. >> >> >> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez <rfern...@google.com> >> wrote: >> >>> - How many resources to ValidatesRunner tests use? >>> - Where are those settings? >>> >>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax <re...@google.com> wrote: >>> >>>> The specific issue only affects Dataflow ValidatesRunner tests. We >>>> currently allow only one of these to run at a time, to control usage of >>>> Dataflow and of GCE quota. Other types of tests do not suffer from this >>>> issue. >>>> >>>> I would like to see if it's possible to increase Dataflow quota so we >>>> can run more of these in parallel. It took me 8 hours end to end to run >>>> these tests (about 6 hours for the run to be scheduled). If there was a >>>> failure, I would have had to repeat the whole process. In the worst case, >>>> this process could have taken me days. While this is not as pressing as >>>> some other issues (as most people don't need to run the Dataflow tests on >>>> every PR), fixing it would make such changes much easier to manage. >>>> >>>> Reuven >>>> >>>> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <rfern...@google.com> >>>> wrote: >>>> >>>>> +Reuven Lax <re...@google.com> told me yesterday that he was waiting >>>>> for some test to be scheduled and run, and it took 6 hours or so. I would >>>>> like to help reduce these wait times by increasing parallelism. I need >>>>> help >>>>> understanding the continuous minimum of what we use. It seems the >>>>> following >>>>> is true: >>>>> >>>>> >>>>> - There seems to always be 16 jenkins machines on (16 CPUs each) >>>>> - There seems to be three GKE machines always on (1 CPU each) >>>>> - Most (if not all) unit tests run on 1 machine, and seem to run >>>>> one-at-a-time <-- I think we can safely parallelize this to 20. >>>>> >>>>> With current quotas, if we parallelize to 20 concurrent unit tests, we >>>>> still have room for 80 other concurrent dataflow jobs to execute, with 75% >>>>> of CPU capacity. >>>>> >>>>> Thoughts? Additional data? >>>>> >>>>> Thanks, >>>>> r >>>>> >>>>