[ 
https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679253#comment-16679253
 ] 

David Vogelbacher commented on SPARK-24437:
-------------------------------------------

[~eyalfa]  There might be hundreds of cached dataframes at the same time (they 
do get unpersisted after a while, but only when they are very unlikely to be 
used again). 
The thing here is that all the dataframes that are cached are generally quite 
small (~100.000 rows). However, they might be created by a series of joins. So 
at times the broadcasted data for a specific, cached dataframe is likely bigger 
than the cached dataframe itself.

This might be a bit of an unusual use case. I do know of the workarounds you 
proposed, but they would significantly harm perf (disabling broadcast joins is 
not something I want to do for example). 

In this specific example (where the cached dataframes are smaller than the 
broadcasted data), it would really be desirable to clean up the broadcasted 
data and not have it stick around on the driver until the dataframe gets 
uncached.
I still don't quite understand why garbage collecting the broadcasted item 
would lead to failures when executing the plan later (in case parts of the 
cached data got evicted), as executing the plan could always just recompute the 
broadcasted variable? [~mgaido]


> Memory leak in UnsafeHashedRelation
> -----------------------------------
>
>                 Key: SPARK-24437
>                 URL: https://issues.apache.org/jira/browse/SPARK-24437
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: gagan taneja
>            Priority: Critical
>         Attachments: Screen Shot 2018-05-30 at 2.05.40 PM.png, Screen Shot 
> 2018-05-30 at 2.07.22 PM.png, Screen Shot 2018-11-01 at 10.38.30 AM.png
>
>
> There seems to memory leak with 
> org.apache.spark.sql.execution.joins.UnsafeHashedRelation
> We have a long running instance of STS.
> With each query execution requiring Broadcast Join, UnsafeHashedRelation is 
> getting added for cleanup in ContextCleaner. This reference of 
> UnsafeHashedRelation is being held at some other Collection and not becoming 
> eligible for GC and because of this ContextCleaner is not able to clean it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to