[
https://issues.apache.org/jira/browse/SPARK-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davies Liu updated SPARK-16766:
-------------------------------
Priority: Minor (was: Critical)
> TakeOrderedAndProjectExec easily cause OOM
> ------------------------------------------
>
> Key: SPARK-16766
> URL: https://issues.apache.org/jira/browse/SPARK-16766
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.2, 2.0.0
> Reporter: drow blonde messi
> Priority: Minor
>
> I found that a very simple SQL statement can easily cause a OOM.
> Like this:
> "insert into xyz2 select * from xyz order by x limit 900000000;"
> The problem is obvious: TakeOrderedAndProjectExec always malloc a huge Object
> array(array size equals to the limit count) when the executeCollect or
> doExecute is called.
> In Spark 1.6, terminal/non-terminal TakeOrderedAndProject works the same
> way: call the RDD.takeOrdered(limit), which produces a huge
> BoundedPriorityQueue for every partition.
> In Spark 2.0, non-terminal TakeOrderedAndProject switch to use the
> org.apache.spark.util.collection.Utils.takeOrdered, but the problem is still
> exists, the expression ordering.leastOf(input.asJava, num).iterator.asScala
> calls the leastOf method of com.google.common.collect.Ordering, and a large
> Object Array is produced:
> int bufferCap = k * 2;
> @SuppressWarnings("unchecked") // we'll only put E's in
> E[] buffer = (E[]) new Object[bufferCap];
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]