Reynold Xin created SPARK-8890:
----------------------------------
Summary: Reduce memory consumption for dynamic partition insert
Key: SPARK-8890
URL: https://issues.apache.org/jira/browse/SPARK-8890
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Reynold Xin
Priority: Critical
Currently, InsertIntoHadoopFsRelation can run out of memory if the number of
table partitions is large. The problem is that we open one output writer for
each partition, and when data are randomized and when the number of partitions
is large, we open a large number of output writers, leading to OOM.
The solution here is to inject a sorting operation once the number of active
partitions is beyond a certain point (e.g. 50?)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]