[
https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829545#comment-16829545
]
Tao Luo commented on SPARK-21492:
---------------------------------
The problem is that the task won't complete because of memory being leaked (You
can see from the simple example above)
Secondly, it's not just the last page, it's every page with records from unused
iterators.
Can we increase the priority of this bug? SMJ is a pretty integral part of
Spark SQL, and it seems like no progress is being made on this bug, which is
causing jobs to fail and has no workaround.
I don't think that it's a hack: the argument seems to be that limit also needs
to fixed, so let's not fix this bug until that is also fixed, meanwhile this
issue has been lingering since at least July 2017.
This would fix a memory leak and improve performance from not spilling
unnecessarily. Why don't we target this fix for SMJ first, since it's pretty
isolated to UnsafeExternalRowIterator in SMJ, run it through all the test
cases, and make additional changes as necessary in the future.
I've been porting [this PR|https://github.com/apache/spark/pull/23762] onto my
production Spark cluster for the last 3 months, but I'm hoping we can get some
sort of fix into 3.0 at least.
I started a discussion thread here, hopefully people can jump in:
http://apache-spark-developers-list.1001551.n3.nabble.com/Memory-leak-in-SortMergeJoin-td27152.html
> Memory leak in SortMergeJoin
> ----------------------------
>
> Key: SPARK-21492
> URL: https://issues.apache.org/jira/browse/SPARK-21492
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0
> Reporter: Zhan Zhang
> Priority: Major
>
> In SortMergeJoin, if the iterator is not exhausted, there will be memory leak
> caused by the Sort. The memory is not released until the task end, and cannot
> be used by other operators causing performance drop or OOM.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]