[
https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675353#comment-16675353
]
Marco Gaido commented on SPARK-24437:
-------------------------------------
[~eyalfa] yes, that is the point, if there is a node failure or maybe not all
the data fits in memory, so only part of the final dataset is cached the
broadcast is needed in order to recompute the data. Spark has no mean to know
when a dataset is not needed anymore: this is something the end user has to
decide and uncache when the dataset is not needed anymore.
> Memory leak in UnsafeHashedRelation
> -----------------------------------
>
> Key: SPARK-24437
> URL: https://issues.apache.org/jira/browse/SPARK-24437
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: gagan taneja
> Priority: Critical
> Attachments: Screen Shot 2018-05-30 at 2.05.40 PM.png, Screen Shot
> 2018-05-30 at 2.07.22 PM.png, Screen Shot 2018-11-01 at 10.38.30 AM.png
>
>
> There seems to memory leak withÂ
> org.apache.spark.sql.execution.joins.UnsafeHashedRelation
> We have a long running instance of STS.
> With each query execution requiring Broadcast Join, UnsafeHashedRelation is
> getting added for cleanup in ContextCleaner. This reference of
> UnsafeHashedRelation is being held at some other Collection and not becoming
> eligible for GC and because of this ContextCleaner is not able to clean it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]