[ 
https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16674827#comment-16674827
 ] 

Marco Gaido commented on SPARK-24437:
-------------------------------------

Hi [~dvogelbacher], thanks for you comment and your analysis. I don't really 
agree with [~eyalfa], in the sense that what is referenced here is the 
broadcasted item. If that would be garbage collected while the plan is still 
executable, then we would have an exception during the execution of the plan.

In this case, it seems that you are caching a dataset, without ever un-caching 
it, which leads to the memory leakage. But this is more an application leakage 
than a Spark one. So please do uncache you datasets when you do not need need 
them anymore and the problem should be solved.

> Memory leak in UnsafeHashedRelation
> -----------------------------------
>
>                 Key: SPARK-24437
>                 URL: https://issues.apache.org/jira/browse/SPARK-24437
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: gagan taneja
>            Priority: Critical
>         Attachments: Screen Shot 2018-05-30 at 2.05.40 PM.png, Screen Shot 
> 2018-05-30 at 2.07.22 PM.png, Screen Shot 2018-11-01 at 10.38.30 AM.png
>
>
> There seems to memory leak with 
> org.apache.spark.sql.execution.joins.UnsafeHashedRelation
> We have a long running instance of STS.
> With each query execution requiring Broadcast Join, UnsafeHashedRelation is 
> getting added for cleanup in ContextCleaner. This reference of 
> UnsafeHashedRelation is being held at some other Collection and not becoming 
> eligible for GC and because of this ContextCleaner is not able to clean it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to