Ian created SPARK-13872:
---------------------------
Summary: Memory leak SortMergeOuterJoin
Key: SPARK-13872
URL: https://issues.apache.org/jira/browse/SPARK-13872
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.1
Reporter: Ian
SortMergeJoin composes its partition/iterator from
org.apache.spark.sql.execution.Sort, which in turns designates the sorting to
UnsafeExternalRowSorter.
UnsafeExternalRowSorter's implementation cleans up the resources when:
1. org.apache.spark.sql.catalyst.util.AbstractScalaRowIterator is fully
iterated.
2. task is done execution.
In case of outer join case of SortMergeJoin, when the left or right iterator is
not fully iterated, the only only occasion for the recources to be cleaned up
is at the end of the spark task. This probably ok most of the time, however
when a SortMergeOuterJoin is nested within a CartesianProduct, the "deferred"
resources cleanup becomes an memory leak amplified by the loop driven by the
CartesianRdd's outter loop iteration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]