Hi Beam committers,

I encountered a similar problem today for "Run Dataflow ValidatesRunner":
  Dataflow quota error for jobs-per-project quota. Project
apache-beam-testing is running 303 jobs.
  
https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_PR/190/testReport/junit/org.apache.beam.sdk/PipelineTest/testTupleProjectionTransform/
via https://github.com/apache/beam/pull/10554 .

Can somebody with permission check any unexpected long-running jobs?

Regards,
Tomo

On Tue, Dec 10, 2019 at 10:37 AM Łukasz Gajowy <lgaj...@apache.org> wrote:
>
> Of course, fixing https://issues.apache.org/jira/browse/BEAM-8939 is also 
> crucial to avoid resource exhaustion but I didn't have time to do this. 
> Anyone, feel free to resolve it.
>
> Thanks!
>
> wt., 10 gru 2019 o 16:25 Łukasz Gajowy <lgaj...@apache.org> napisał(a):
>>
>> https://github.com/apache/beam/pull/10342 - pr that skips the tests listed 
>> above - looking for reviewers
>>
>> Thanks!
>>
>> wt., 10 gru 2019 o 13:30 Łukasz Gajowy <lgaj...@apache.org> napisał(a):
>>>
>>> What I invoked in the apache-beam-testing project:
>>>
>>> gcloud dataflow jobs list --created-before=-P5H --status=active 
>>> --format="value(JOB_ID)" --region=us-central|xargs gcloud dataflow jobs 
>>> cancel
>>>
>>> wt., 10 gru 2019 o 13:28 Łukasz Gajowy <lgaj...@apache.org> napisał(a):
>>>>
>>>> Hi Kirill,
>>>>
>>>> We (along with Michał and Kamil) noticed the problem as well in Dataflow 
>>>> ValidatesRunner suites yesterday. I started investigating the problem and 
>>>> I noticed that there are jobs running for 5 days and counting. It seems 
>>>> that those are not stopped by "beam_CancelStaleDataflowJobs" job that runs 
>>>> randomly each day. After investigating deeper, it seems that lots of the 
>>>> jobs that are stale are from 
>>>> "https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/"; job that 
>>>> is currently being ABORTED due to timeout.
>>>>
>>>> Some tests (I'm not sure if this is the exhaustive list but they seem to 
>>>> appear in the dataflow console repeatedly) that seem to not be killed and 
>>>> eat our resources:
>>>>  - test_reshuffle_preserves_timestamps (spotted multiple times in the 
>>>> dataflow console) (Python SDK)
>>>>  - test_flatten_same_pcollections (Python SDK)
>>>>  - testPairWithIndexWindowedTimestampedBounded (Java SDK)
>>>>  - testPairWithIndexBasicBounded
>>>>
>>>> I created https://issues.apache.org/jira/browse/BEAM-8938 to track tests 
>>>> like this. Right now I'm going to kill all jobs that hang like this and 
>>>> ignore the tests that I tracked down in a pr for the issue I created.
>>>>
>>>> I think it's good that job_CancelStaleDataflowJobs didn't catch them - I 
>>>> think that if it did, we would not spot the problem. Is it possible to set 
>>>> up some alerting on Dataflow instead of automatically cleaning the jobs? 
>>>> IMO we should fix the tests rather than cancel them.
>>>>
>>>> Thanks,
>>>> Łukasz
>>>>
>>>>
>>>> wt., 10 gru 2019 o 00:09 Kirill Kozlov <kirillkoz...@google.com> 
>>>> napisał(a):
>>>>>
>>>>> Hello everyone!
>>>>>
>>>>> It looks like JavaPostCommit Jenkins tests [1] are failing due to CPU 
>>>>> quota limitations.
>>>>> Could someone please look into this?
>>>>>
>>>>> [1] 
>>>>> https://builds.apache.org/job/beam_PostCommit_Java/4838/testReport/junit/org.apache.beam.examples.complete/TrafficMaxLaneFlowIT/testE2ETrafficMaxLaneFlow/
>>>>>
>>>>> --
>>>>> Kirill



-- 
Regards,
Tomo

Reply via email to