[jira] [Created] (SPARK-8968) shuffled by the partition clomns when dynamic partitioning to optimize the memory overhead

Fei Wang (JIRA) Thu, 09 Jul 2015 18:42:16 -0700

Fei Wang created SPARK-8968:
-------------------------------

             Summary: shuffled by the partition clomns when dynamic 
partitioning to optimize the memory overhead
                 Key: SPARK-8968
                 URL: https://issues.apache.org/jira/browse/SPARK-8968
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.4.0
            Reporter: Fei Wang



now the dynamic partitioning show the bad performance for big data due to the 
GC/memory overhead.  this is because each task each partition now we open a 
writer to write the data, this will cause many small files and high GC. We can 
shuffle data by the partition columns so that each partition will have ony one 
partition file and this also reduce the gc overhead  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-8968) shuffled by the partition clomns when dynamic partitioning to optimize the memory overhead

Reply via email to