[ 
https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829545#comment-16829545
 ] 

Tao Luo commented on SPARK-21492:
---------------------------------

The problem is that the task won't complete because of memory being leaked (You 
can see from the simple example above)
Secondly, it's not just the last page, it's every page with records from unused 
iterators. 
Can we increase the priority of this bug? SMJ is a pretty integral part of 
Spark SQL, and it seems like no progress is being made on this bug, which is 
causing jobs to fail and has no workaround. 

I don't think that it's a hack: the argument seems to be that limit also needs 
to fixed, so let's not fix this bug until that is also fixed, meanwhile this 
issue has been lingering since at least July 2017. 
This would fix a memory leak and improve performance from not spilling 
unnecessarily. Why don't we target this fix for SMJ first, since it's pretty 
isolated to UnsafeExternalRowIterator in SMJ, run it through all the test 
cases, and make additional changes as necessary in the future. 

I've been porting [this PR|https://github.com/apache/spark/pull/23762] onto my 
production Spark cluster for the last 3 months, but I'm hoping we can get some 
sort of fix into 3.0 at least.

I started a discussion thread here, hopefully people can jump in:
http://apache-spark-developers-list.1001551.n3.nabble.com/Memory-leak-in-SortMergeJoin-td27152.html


> Memory leak in SortMergeJoin
> ----------------------------
>
>                 Key: SPARK-21492
>                 URL: https://issues.apache.org/jira/browse/SPARK-21492
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0
>            Reporter: Zhan Zhang
>            Priority: Major
>
> In SortMergeJoin, if the iterator is not exhausted, there will be memory leak 
> caused by the Sort. The memory is not released until the task end, and cannot 
> be used by other operators causing performance drop or OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to