[ https://issues.apache.org/jira/browse/BEAM-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906554#comment-16906554 ]
Oded Valtzer commented on BEAM-6777: ------------------------------------ I am referring to a python streaming pipeline using the latest sdk (happened also in previous versions) on google dataflow. i can provideĀ links to pipelines in our project which reached into this state and eventually stopped doing any work and we cancelled them. does it what you look for? > SDK Harness Resilience > ---------------------- > > Key: BEAM-6777 > URL: https://issues.apache.org/jira/browse/BEAM-6777 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow > Reporter: Sam Rohde > Assignee: Yueyang Qiu > Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > If the Python SDK Harness crashes in any way (user code exception, OOM, etc) > the job will hang and waste resources. The fix is to add a daemon in the SDK > Harness and Runner Harness to communicate with Dataflow to restart the VM > when stuckness is detected. -- This message was sent by Atlassian JIRA (v7.6.14#76016)