[
https://issues.apache.org/jira/browse/BEAM-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot updated BEAM-5108:
--------------------------------
Labels: (was: stale-P2)
> Improve Python test framework to prevent streaming pipeline leaks
> -----------------------------------------------------------------
>
> Key: BEAM-5108
> URL: https://issues.apache.org/jira/browse/BEAM-5108
> Project: Beam
> Issue Type: Task
> Components: testing
> Reporter: Mark Liu
> Priority: P3
>
> Recently, few Python streaming pipelines on Dataflow apache-beam-testing
> project run for more than 5 days. This look like a leaking from Jenkins job
> that runs e2e integration tests.
> Test framework has a pipeline resource clean up and applies to all
> integration test, which is defined in
> [TestDataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L67].
> However, the cancellation may failed in a special case, like following (from
> [this Jenkins
> run|https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/5636/consoleFull]):
> {quote}
> Workflow modification failed. Causes: (c53cc746f7bc7f49): Operation cancel
> not allowed for job 2018-08-01_13_10_24-5019826606522054507. Job is not yet
> ready for canceling. Please retry in a few minutes.
> {quote}
> Two possible approaches to improve:
> 1. Add retry to the framework cancellation.
> 2. Instead of wait until pipeline in RUNNING state
> ([here|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L57]),
> we want to wait more to make sure worker pool starts successfully.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)