[ 
https://issues.apache.org/jira/browse/SPARK-24437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675719#comment-16675719
 ] 

Eyal Farago commented on SPARK-24437:
-------------------------------------

[~dvogelbacher] this is a bit puzzling,

spark will usually choose to broadcast a relation if it's small enough, is this 
not the case here? I'd usually expect the join to be larger than its 
components. can you specify the sizes here? how many queries are cached at any 
given moment?

few workarounds I can think of: tweak the broadcast threshold configuration 
key, I don't remember the exact details but I think you can disable it 
altogether. 

 

another option might be the _checkpoint_ operation which trades disk space for 
truncated lineage, according to the scaladocs it's 'evolving' and available 
since spark 2.1, might be worth a try.

> Memory leak in UnsafeHashedRelation
> -----------------------------------
>
>                 Key: SPARK-24437
>                 URL: https://issues.apache.org/jira/browse/SPARK-24437
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: gagan taneja
>            Priority: Critical
>         Attachments: Screen Shot 2018-05-30 at 2.05.40 PM.png, Screen Shot 
> 2018-05-30 at 2.07.22 PM.png, Screen Shot 2018-11-01 at 10.38.30 AM.png
>
>
> There seems to memory leak with 
> org.apache.spark.sql.execution.joins.UnsafeHashedRelation
> We have a long running instance of STS.
> With each query execution requiring Broadcast Join, UnsafeHashedRelation is 
> getting added for cleanup in ContextCleaner. This reference of 
> UnsafeHashedRelation is being held at some other Collection and not becoming 
> eligible for GC and because of this ContextCleaner is not able to clean it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to