Of course, fixing https://issues.apache.org/jira/browse/BEAM-8939 is also
crucial to avoid resource exhaustion but I didn't have time to do this.
Anyone, feel free to resolve it.

Thanks!

wt., 10 gru 2019 o 16:25 Łukasz Gajowy <lgaj...@apache.org> napisał(a):

> https://github.com/apache/beam/pull/10342 - pr that skips the tests
> listed above - looking for reviewers
>
> Thanks!
>
> wt., 10 gru 2019 o 13:30 Łukasz Gajowy <lgaj...@apache.org> napisał(a):
>
>> What I invoked in the apache-beam-testing project:
>>
>> gcloud dataflow jobs list --created-before=-P5H --status=active
>> --format="value(JOB_ID)" --region=us-central|xargs gcloud dataflow jobs
>> cancel
>>
>> wt., 10 gru 2019 o 13:28 Łukasz Gajowy <lgaj...@apache.org> napisał(a):
>>
>>> Hi Kirill,
>>>
>>> We (along with Michał and Kamil) noticed the problem as well in Dataflow
>>> ValidatesRunner suites yesterday. I started investigating the problem and I
>>> noticed that there are jobs running for 5 days and counting. It seems that
>>> those are not stopped by "beam_CancelStaleDataflowJobs" job that runs
>>> randomly each day. After investigating deeper, it seems that lots of the
>>> jobs that are stale are from "
>>> https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/"; job that
>>> is currently being ABORTED due to timeout.
>>>
>>> Some tests (I'm not sure if this is the exhaustive list but they seem to
>>> appear in the dataflow console repeatedly) that seem to not be killed and
>>> eat our resources:
>>>  - test_reshuffle_preserves_timestamps
>>> <https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487>
>>> (spotted multiple times in the dataflow console) (Python SDK)
>>>  - test_flatten_same_pcollections
>>> <https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596>
>>> (Python SDK)
>>>  - testPairWithIndexWindowedTimestampedBounded
>>> <https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158>
>>> (Java SDK)
>>>  - testPairWithIndexBasicBounded
>>> <https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125>
>>>
>>> I created https://issues.apache.org/jira/browse/BEAM-8938 to track
>>> tests like this. Right now I'm going to kill all jobs that hang like this
>>> and ignore the tests that I tracked down in a pr for the issue I created.
>>>
>>> I think it's good that job_CancelStaleDataflowJobs didn't catch them - I
>>> think that if it did, we would not spot the problem. Is it possible to set
>>> up some alerting on Dataflow instead of automatically cleaning the jobs?
>>> IMO we should fix the tests rather than cancel them.
>>>
>>> Thanks,
>>> Łukasz
>>>
>>>
>>> wt., 10 gru 2019 o 00:09 Kirill Kozlov <kirillkoz...@google.com>
>>> napisał(a):
>>>
>>>> Hello everyone!
>>>>
>>>> It looks like JavaPostCommit Jenkins tests [1] are failing due to CPU
>>>> quota limitations.
>>>> Could someone please look into this?
>>>>
>>>> [1]
>>>> https://builds.apache.org/job/beam_PostCommit_Java/4838/testReport/junit/org.apache.beam.examples.complete/TrafficMaxLaneFlowIT/testE2ETrafficMaxLaneFlow/
>>>>
>>>> --
>>>> Kirill
>>>>
>>>

Reply via email to