[jira] [Created] (SPARK-8890) Reduce memory consumption for dynamic partition insert

Reynold Xin (JIRA) Tue, 07 Jul 2015 23:50:38 -0700

Reynold Xin created SPARK-8890:
----------------------------------

             Summary: Reduce memory consumption for dynamic partition insert
                 Key: SPARK-8890
                 URL: https://issues.apache.org/jira/browse/SPARK-8890
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Reynold Xin
            Priority: Critical



Currently, InsertIntoHadoopFsRelation can run out of memory if the number of 
table partitions is large. The problem is that we open one output writer for 
each partition, and when data are randomized and when the number of partitions 
is large, we open a large number of output writers, leading to OOM.

The solution here is to inject a sorting operation once the number of active 
partitions is beyond a certain point (e.g. 50?)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-8890) Reduce memory consumption for dynamic partition insert

Reply via email to