kennknowles opened a new issue, #19004: URL: https://github.com/apache/beam/issues/19004
Recently, few Python streaming pipelines on Dataflow apache-beam-testing project run for more than 5 days. This look like a leaking from Jenkins job that runs e2e integration tests. Test framework has a pipeline resource clean up and applies to all integration test, which is defined in [TestDataflowRunner](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L67). However, the cancellation may failed in a special case, like following (from [this Jenkins run](https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/5636/consoleFull)): > > rkflow modification failed. Causes: (c53cc746f7bc7f49): Operation cancel not allowed for job 2018-08-01_13_10_24-5019826606522054507. Job is not yet ready for canceling. Please retry in a few minutes. > Two possible approaches to improve: 1. Add retry to the framework cancellation. 2. Instead of wait until pipeline in RUNNING state ([here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L57)), we want to wait more to make sure worker pool starts successfully. Imported from Jira [BEAM-5108](https://issues.apache.org/jira/browse/BEAM-5108). Original Jira may contain additional context. Reported by: markflyhigh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
