OK, Scott just sent https://github.com/apache/beam/pull/5860 . Quotas should not be a problem, if they are, please file a JIRA under gcp-quota.
Cheers, r On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles <k...@google.com> wrote: > One thing that is nice when you do this is to be able to share your > results. Though if all you are sharing is "they passed" then I guess we > don't have to insist on evidence. > > Kenn > > On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner <sc...@apache.org> wrote: > >> A few thoughts: >> >> * The Jenkins job getting backed up >> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since >> Mikhail refactored Jenkins jobs, this only runs when explicitly requested >> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job >> is idle more often than backlogged. >> >> * It's difficult to reason about our exact quota needs because Dataflow >> jobs get launched from various Jenkins jobs that have different parallelism >> configurations. If we have budget, we could enable concurrent execution of >> this job and increase our quota enough to give some breathing room. If we >> do this, I recommend limiting the max concurrency via >> throttleConcurrentBuilds [2] to some reasonable limit. >> >> * This test suite is meant to be an exhaustive post-commit validation of >> Dataflow runner, and tests a lot of different aspects of a runner. It would >> be more efficient to run locally only the tests affected by your change. >> Note that this requires having access to a GCP project with billing, but >> most Dataflow developers probably have access to this already. The command >> for this is: >> >> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner >> -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot >> --tests "org.apache.beam.MyTestClass" >> >> [1] >> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend >> [2] >> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds >> >> >> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik <lc...@google.com> wrote: >> >>> The validates runner test parallelism is controlled here and is >>> currently set to be "unlimited": >>> >>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115 >>> >>> Each test fork is run on a different gradle worker, so the number of >>> parallel test runs is limited to the max number of workers configured which >>> is controlled here: >>> >>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50 >>> It is currently configured to 3 * number of CPU cores. >>> >>> We are already running up to 48 Dataflow jobs in parallel. >>> >>> >>> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez <rfern...@google.com> >>> wrote: >>> >>>> - How many resources to ValidatesRunner tests use? >>>> - Where are those settings? >>>> >>>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax <re...@google.com> wrote: >>>> >>>>> The specific issue only affects Dataflow ValidatesRunner tests. We >>>>> currently allow only one of these to run at a time, to control usage of >>>>> Dataflow and of GCE quota. Other types of tests do not suffer from this >>>>> issue. >>>>> >>>>> I would like to see if it's possible to increase Dataflow quota so we >>>>> can run more of these in parallel. It took me 8 hours end to end to run >>>>> these tests (about 6 hours for the run to be scheduled). If there was a >>>>> failure, I would have had to repeat the whole process. In the worst case, >>>>> this process could have taken me days. While this is not as pressing as >>>>> some other issues (as most people don't need to run the Dataflow tests on >>>>> every PR), fixing it would make such changes much easier to manage. >>>>> >>>>> Reuven >>>>> >>>>> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <rfern...@google.com> >>>>> wrote: >>>>> >>>>>> +Reuven Lax <re...@google.com> told me yesterday that he was waiting >>>>>> for some test to be scheduled and run, and it took 6 hours or so. I would >>>>>> like to help reduce these wait times by increasing parallelism. I need >>>>>> help >>>>>> understanding the continuous minimum of what we use. It seems the >>>>>> following >>>>>> is true: >>>>>> >>>>>> >>>>>> - There seems to always be 16 jenkins machines on (16 CPUs each) >>>>>> - There seems to be three GKE machines always on (1 CPU each) >>>>>> - Most (if not all) unit tests run on 1 machine, and seem to run >>>>>> one-at-a-time <-- I think we can safely parallelize this to 20. >>>>>> >>>>>> With current quotas, if we parallelize to 20 concurrent unit tests, >>>>>> we still have room for 80 other concurrent dataflow jobs to execute, with >>>>>> 75% of CPU capacity. >>>>>> >>>>>> Thoughts? Additional data? >>>>>> >>>>>> Thanks, >>>>>> r >>>>>> >>>>>
smime.p7s
Description: S/MIME Cryptographic Signature