[
https://issues.apache.org/jira/browse/BEAM-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles updated BEAM-7750:
----------------------------------
This Jira ticket has a pull request attached to it, but is still open. Did the
pull request resolve the issue? If so, could you please mark it resolved? This
will help the project have a clear view of its open issues.
> Pipeline instances are not garbage collected
> --------------------------------------------
>
> Key: BEAM-7750
> URL: https://issues.apache.org/jira/browse/BEAM-7750
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.14.0
> Environment: OS: Debian rodete.
> Tested using:
> Beam versions: 2.13.0, 2.15.0.dev
> Python versions: Python 2.7, Python 3.7.
> Runners: DirectRunner, DataflowRunner.
> Reporter: Alexey Strokach
> Priority: P3
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> It seems that Apache Beam's Pipeline instances are not garbage collected,
> even if the pipelines are finished or cancelled and there are no references
> to those pipelines in the Python interpreter.
> For pipelines executed in a script, this is not a problem. However, for
> interactive pipelines executed inside a Jupyter notebook, this limits how
> well we can track and remove the dependencies of those pipelines. For
> example, if a pipeline reads from some cache, it would be nice to be able to
> delete that cache once there are no references to it from pipelines or the
> global namespace.
> The issue can be reproduced using the following script:
> [https://gist.github.com/ostrokach/a16556dc77c96b87fe23c2fbd8fb6346].
> -----
> On further examination, turns out that this is due to the
> [{{_PubSubReadEvaluator._subscription_cache}}|https://github.com/apache/beam/blob/27bb5bc7b244809e7f6022adb2730d10204ce4d3/sdks/python/apache_beam/runners/direct/transform_evaluator.py#L418]
> class attribute keeping references to all {{ReadFromPubSub}} transforms.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)