[
https://issues.apache.org/jira/browse/BEAM-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on BEAM-7873 started by Hannah Jiang.
------------------------------------------
> FnApi with Subprocess runner hangs frequently when running with multi workers
> -----------------------------------------------------------------------------
>
> Key: BEAM-7873
> URL: https://issues.apache.org/jira/browse/BEAM-7873
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Hannah Jiang
> Assignee: Hannah Jiang
> Priority: Major
> Fix For: 2.15.0
>
>
> Pipeline hangs at
> [p.wait()|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service.py#L208]
> when shut it down. I looked into source code of subprocess, and
> [py27|https://github.com/enthought/Python-2.7.3/blob/master/Lib/subprocess.py#L1286]
> doesn't do any lock while
> [py3|https://github.com/python/cpython/blob/3.7/Lib/subprocess.py#L1592]
> locks when waiting. Py3 added locks at other places of Popen() as well, all
> unlocked places with py2 may contribute to the problem.
> I think this is the root cause of hanging.
> A workaround is sleeping 0.1 or even better 0.5 second between each call of
> Popen() so it does not deadlock. I ran wordcound.py 1000 times with 2
> workers, and sleeping 0.1 second worked fine.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)