Mark Liu created BEAM-5108:
------------------------------
Summary: Python test framework should prevent streaming pipeline
leaks
Key: BEAM-5108
URL: https://issues.apache.org/jira/browse/BEAM-5108
Project: Beam
Issue Type: Task
Components: testing
Reporter: Mark Liu
Recently, few Python streaming pipelines on Dataflow apache-beam-testing
project run for more than 5 days. This look like a leaking from Jenkins job
that runs e2e integration tests.
Test framework has a pipeline resource clean up and applies to all integration
test, which is defined in
[TestDataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L67].
However, the cancellation may failed in a special case, like following (from
[this Jenkins
run|https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/5636/consoleFull]):
{quote}
Workflow modification failed. Causes: (c53cc746f7bc7f49): Operation cancel not
allowed for job 2018-08-01_13_10_24-5019826606522054507. Job is not yet ready
for canceling. Please retry in a few minutes.
{quote}
Two possible approaches to improve test infra:
1. Add retry to the framework cancellation.
2. Instead of wait until pipeline in RUNNING state
([here|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L57]),
we want to wait more to make sure worker pool starts successfully.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)