Hi Beam committers, I encountered a similar problem today for "Run Dataflow ValidatesRunner": Dataflow quota error for jobs-per-project quota. Project apache-beam-testing is running 303 jobs. https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/190/testReport/junit/org.apache.beam.sdk/PipelineTest/testTupleProjectionTransform/ via https://github.com/apache/beam/pull/10554 .
Can somebody with permission check any unexpected long-running jobs? Regards, Tomo On Tue, Dec 10, 2019 at 10:37 AM Łukasz Gajowy <lgaj...@apache.org> wrote: > > Of course, fixing https://issues.apache.org/jira/browse/BEAM-8939 is also > crucial to avoid resource exhaustion but I didn't have time to do this. > Anyone, feel free to resolve it. > > Thanks! > > wt., 10 gru 2019 o 16:25 Łukasz Gajowy <lgaj...@apache.org> napisał(a): >> >> https://github.com/apache/beam/pull/10342 - pr that skips the tests listed >> above - looking for reviewers >> >> Thanks! >> >> wt., 10 gru 2019 o 13:30 Łukasz Gajowy <lgaj...@apache.org> napisał(a): >>> >>> What I invoked in the apache-beam-testing project: >>> >>> gcloud dataflow jobs list --created-before=-P5H --status=active >>> --format="value(JOB_ID)" --region=us-central|xargs gcloud dataflow jobs >>> cancel >>> >>> wt., 10 gru 2019 o 13:28 Łukasz Gajowy <lgaj...@apache.org> napisał(a): >>>> >>>> Hi Kirill, >>>> >>>> We (along with Michał and Kamil) noticed the problem as well in Dataflow >>>> ValidatesRunner suites yesterday. I started investigating the problem and >>>> I noticed that there are jobs running for 5 days and counting. It seems >>>> that those are not stopped by "beam_CancelStaleDataflowJobs" job that runs >>>> randomly each day. After investigating deeper, it seems that lots of the >>>> jobs that are stale are from >>>> "https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/" job that >>>> is currently being ABORTED due to timeout. >>>> >>>> Some tests (I'm not sure if this is the exhaustive list but they seem to >>>> appear in the dataflow console repeatedly) that seem to not be killed and >>>> eat our resources: >>>> - test_reshuffle_preserves_timestamps (spotted multiple times in the >>>> dataflow console) (Python SDK) >>>> - test_flatten_same_pcollections (Python SDK) >>>> - testPairWithIndexWindowedTimestampedBounded (Java SDK) >>>> - testPairWithIndexBasicBounded >>>> >>>> I created https://issues.apache.org/jira/browse/BEAM-8938 to track tests >>>> like this. Right now I'm going to kill all jobs that hang like this and >>>> ignore the tests that I tracked down in a pr for the issue I created. >>>> >>>> I think it's good that job_CancelStaleDataflowJobs didn't catch them - I >>>> think that if it did, we would not spot the problem. Is it possible to set >>>> up some alerting on Dataflow instead of automatically cleaning the jobs? >>>> IMO we should fix the tests rather than cancel them. >>>> >>>> Thanks, >>>> Łukasz >>>> >>>> >>>> wt., 10 gru 2019 o 00:09 Kirill Kozlov <kirillkoz...@google.com> >>>> napisał(a): >>>>> >>>>> Hello everyone! >>>>> >>>>> It looks like JavaPostCommit Jenkins tests [1] are failing due to CPU >>>>> quota limitations. >>>>> Could someone please look into this? >>>>> >>>>> [1] >>>>> https://builds.apache.org/job/beam_PostCommit_Java/4838/testReport/junit/org.apache.beam.examples.complete/TrafficMaxLaneFlowIT/testE2ETrafficMaxLaneFlow/ >>>>> >>>>> -- >>>>> Kirill -- Regards, Tomo