[
https://issues.apache.org/jira/browse/SPARK-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian updated SPARK-13872:
------------------------
Attachment: Screen Shot 2016-03-11 at 5.42.32 PM.png
> Memory leak SortMergeOuterJoin
> ------------------------------
>
> Key: SPARK-13872
> URL: https://issues.apache.org/jira/browse/SPARK-13872
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Reporter: Ian
> Attachments: Screen Shot 2016-03-11 at 5.42.32 PM.png
>
>
> SortMergeJoin composes its partition/iterator from
> org.apache.spark.sql.execution.Sort, which in turns designates the sorting to
> UnsafeExternalRowSorter.
> UnsafeExternalRowSorter's implementation cleans up the resources when:
> 1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully
> iterated.
> 2. task is done execution.
> In case of outer join case of SortMergeJoin, when the left or right iterator
> is not fully iterated, the only only occasion for the recources to be cleaned
> up is at the end of the spark task. This probably ok most of the time,
> however when a SortMergeOuterJoin is nested within a CartesianProduct, the
> "deferred" resources cleanup becomes an memory leak amplified by the loop
> driven by the CartesianRdd's outter loop iteration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]