[ 
https://issues.apache.org/jira/browse/BEAM-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7750:
-------------------------------
    Status: Open  (was: Triage Needed)

> Pipeline instances are not garbage collected
> --------------------------------------------
>
>                 Key: BEAM-7750
>                 URL: https://issues.apache.org/jira/browse/BEAM-7750
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.14.0
>         Environment: OS: Debian rodete.
> Tested using: 
> Beam versions: 2.13.0, 2.15.0.dev
> Python versions: Python 2.7, Python 3.7.
> Runners:  DirectRunner, DataflowRunner.
>            Reporter: Alexey Strokach
>            Priority: Minor
>
> It seems that Apache Beam's Pipeline instances are not garbage collected, 
> even if the pipelines are finished or cancelled and there are no references 
> to those pipelines in the Python interpreter.
> For pipelines executed in a script, this is not a problem. However, for 
> interactive pipelines executed inside a Jupyter notebook, this limits how 
> well we can track and remove the dependencies of those pipelines. For 
> example, if a pipeline reads from some cache, it would be nice to be able to 
> delete that cache once there are no references to it from pipelines or the 
> global namespace.
> The issue can be reproduced using the following script: 
> [https://github.com/ostrokach/beam-notebooks/blob/48718038e63342a5f3acc31352a6326fffd34888/scripts/error_pipeline_gc.py]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to