[ 
https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951165#comment-16951165
 ] 

Min Shen commented on SPARK-21492:
----------------------------------

Want to further clarify the scope of the fix in PR 
[https://github.com/apache/spark/pull/25888].

Based on previous work by [~taoluo], this PR further fixes the issue for SMJ 
codegen.

[~hvanhovell] raised 2 concerns in [~taoluo]'s PR in 
[https://github.com/apache/spark/pull/23762]:
 # This only works for a SMJ with Sorts as its direct input.
 # Not sure if it safe to assume that you can close an underlying child like 
this.

The fix in PR [https://github.com/apache/spark/pull/25888] should have 
addressed concern #2, i.e. it guarantees safeness on closing the iterator for a 
Sort operator early.

This fix does not yet propagate the requests to close iterators of both child 
operators of a SMJ throughout the plan tree to reach the Sort operators.

However, with our experiences in operating all Spark workloads at LI, it is 
mostly common for SMJ not having Sort as its direct input when there are 
multiple SMJs stacked together.

In this case, even if we are not yet propagating the requests, each SMJ can 
still properly handle its local child operators which would still help to 
release the resources early.

> Memory leak in SortMergeJoin
> ----------------------------
>
>                 Key: SPARK-21492
>                 URL: https://issues.apache.org/jira/browse/SPARK-21492
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0
>            Reporter: Zhan Zhang
>            Priority: Major
>
> In SortMergeJoin, if the iterator is not exhausted, there will be memory leak 
> caused by the Sort. The memory is not released until the task end, and cannot 
> be used by other operators causing performance drop or OOM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to