[
https://issues.apache.org/jira/browse/SPARK-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621935#comment-14621935
]
Fei Wang commented on SPARK-8968:
---------------------------------
changed, how about this?
> dynamic partitioning in spark sql performance issue due to the high GC
> overhead
> -------------------------------------------------------------------------------
>
> Key: SPARK-8968
> URL: https://issues.apache.org/jira/browse/SPARK-8968
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Fei Wang
>
> now the dynamic partitioning show the bad performance for big data due to the
> GC/memory overhead. this is because each task each partition now we open a
> writer to write the data, this will cause many small files and high GC. We
> can shuffle data by the partition columns so that each partition will have
> ony one partition file and this also reduce the gc overhead
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]