kennknowles opened a new issue, #19004:
URL: https://github.com/apache/beam/issues/19004

   Recently, few Python streaming pipelines on Dataflow apache-beam-testing 
project run for more than 5 days. This look like a leaking from Jenkins job 
that runs e2e integration tests.
   
   Test framework has a pipeline resource clean up and applies to all 
integration test, which is defined in 
[TestDataflowRunner](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L67).
 However, the cancellation may failed in a special case, like following (from 
[this Jenkins 
run](https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/5636/consoleFull)):
   > 
   > rkflow modification failed. Causes: (c53cc746f7bc7f49): Operation cancel 
not allowed for job 2018-08-01_13_10_24-5019826606522054507. Job is not yet 
ready for canceling. Please retry in a few minutes.
   > 
   Two possible approaches to improve:
   1. Add retry to the framework cancellation.
   2. Instead of wait until pipeline in RUNNING state 
([here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L57)),
 we want to wait more to make sure worker pool starts successfully.
   
   Imported from Jira 
[BEAM-5108](https://issues.apache.org/jira/browse/BEAM-5108). Original Jira may 
contain additional context.
   Reported by: markflyhigh.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to