Yu Chen created FLINK-33613:
-------------------------------
Summary: Python UDF Runner process leak in Process Mode
Key: FLINK-33613
URL: https://issues.apache.org/jira/browse/FLINK-33613
Project: Flink
Issue Type: Bug
Components: API / Python
Affects Versions: 1.17.0
Reporter: Yu Chen
Attachments: ps-ef.txt, streaming_word_count-1.py
While working with PyFlink, we found that in Process Mode, the Python UDF
process may leak after a failover of the job. It leads to a rising number of
processes with their threads in the host machine, which eventually results in
failure to create new threads.
You can try to reproduce it with the attached test task
`streamin_word_count.py`.
(Note that the job will continue failover, and you can watch the process leaks
by `ps -ef` on Taskmanager.
Our test environment:
* K8S Application Mode
* 4 Taskmanagers with 12 slots/TM
* Job's parallelism was set to 48
The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence
with parallelism (48), but we found that there are 180 processes after several
failovers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)