[jira] [Commented] (SPARK-8968) dynamic partitioning in spark sql performance issue due to the high GC overhead

Fei Wang (JIRA) Fri, 10 Jul 2015 00:59:13 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621935#comment-14621935
 ]


Fei Wang commented on SPARK-8968:
---------------------------------

changed, how about this?

> dynamic partitioning in spark sql performance issue due to the high GC 
> overhead
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-8968
>                 URL: https://issues.apache.org/jira/browse/SPARK-8968
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.4.0
>            Reporter: Fei Wang
>
> now the dynamic partitioning show the bad performance for big data due to the 
> GC/memory overhead.  this is because each task each partition now we open a 
> writer to write the data, this will cause many small files and high GC. We 
> can shuffle data by the partition columns so that each partition will have 
> ony one partition file and this also reduce the gc overhead  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-8968) dynamic partitioning in spark sql performance issue due to the high GC overhead

Reply via email to