Fei Wang created SPARK-8968:
-------------------------------

             Summary: shuffled by the partition clomns when dynamic 
partitioning to optimize the memory overhead
                 Key: SPARK-8968
                 URL: https://issues.apache.org/jira/browse/SPARK-8968
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.4.0
            Reporter: Fei Wang


now the dynamic partitioning show the bad performance for big data due to the 
GC/memory overhead.  this is because each task each partition now we open a 
writer to write the data, this will cause many small files and high GC. We can 
shuffle data by the partition columns so that each partition will have ony one 
partition file and this also reduce the gc overhead  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to