[ 
https://issues.apache.org/jira/browse/SPARK-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-13872.
----------------------------------
    Resolution: Incomplete

> Memory leak in SortMergeOuterJoin
> ---------------------------------
>
>                 Key: SPARK-13872
>                 URL: https://issues.apache.org/jira/browse/SPARK-13872
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Ian
>            Priority: Major
>              Labels: bulk-closed
>         Attachments: Screen Shot 2016-03-11 at 5.42.32 PM.png
>
>
> SortMergeJoin composes its partition/iterator from 
> org.apache.spark.sql.execution.Sort, which in turns designates the sorting to 
> UnsafeExternalRowSorter.
> UnsafeExternalRowSorter's implementation cleans up the resources when:
> 1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully 
> iterated.
> 2. task is done execution.
> In outer join case of SortMergeJoin, when the left or right iterator is not 
> fully iterated, the only chance for the resources to be cleaned up is at the 
> end of the spark task run. 
> This probably ok most of the time, however when a SortMergeOuterJoin is 
> nested within a CartesianProduct, the "deferred" resources cleanup allows a 
> none-ignorable memory leak amplified/cumulated by the loop driven by the 
> CartesianRdd's looping iteration.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to