[
https://issues.apache.org/jira/browse/SPARK-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-13872.
----------------------------------
Resolution: Incomplete
> Memory leak in SortMergeOuterJoin
> ---------------------------------
>
> Key: SPARK-13872
> URL: https://issues.apache.org/jira/browse/SPARK-13872
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Reporter: Ian
> Priority: Major
> Labels: bulk-closed
> Attachments: Screen Shot 2016-03-11 at 5.42.32 PM.png
>
>
> SortMergeJoin composes its partition/iterator from
> org.apache.spark.sql.execution.Sort, which in turns designates the sorting to
> UnsafeExternalRowSorter.
> UnsafeExternalRowSorter's implementation cleans up the resources when:
> 1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully
> iterated.
> 2. task is done execution.
> In outer join case of SortMergeJoin, when the left or right iterator is not
> fully iterated, the only chance for the resources to be cleaned up is at the
> end of the spark task run.
> This probably ok most of the time, however when a SortMergeOuterJoin is
> nested within a CartesianProduct, the "deferred" resources cleanup allows a
> none-ignorable memory leak amplified/cumulated by the loop driven by the
> CartesianRdd's looping iteration.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]