Alexey Strokach created BEAM-7750:
-------------------------------------
Summary: Pipeline instances are not garbage collected
Key: BEAM-7750
URL: https://issues.apache.org/jira/browse/BEAM-7750
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Affects Versions: 2.14.0
Environment: OS: Debian rodete.
Tested using:
Beam versions: 2.13.0, 2.15.0.dev
Python versions: Python 2.7, Python 3.7.
Runners: DirectRunner, DataflowRunner.
Reporter: Alexey Strokach
It seems that Apache Beam's Pipeline instances are not garbage collected, even
if the pipelines are finished or cancelled, and there are no references to
those pipelines in the Python interpreter.
For pipelines executed in a script, this is not a problem. However, for
interactive pipelines executed inside a Jupyter notebook, this limits how well
we can track and remove the dependencies of those pipelines. For example, if a
pipeline reads from some cache, it would be nice to be able to delete that
cache once there are no references to it from a pipeline or the global
namespace.
The issue can be reproduced using the following script:
https://github.com/ostrokach/beam-notebooks/blob/48718038e63342a5f3acc31352a6326fffd34888/scripts/error_pipeline_gc.py
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)