[
https://issues.apache.org/jira/browse/SPARK-20107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943220#comment-15943220
]
Yuming Wang commented on SPARK-20107:
-------------------------------------
I will create a PR later
> Speed up FileOutputCommitter#commitJob for many output files
> ------------------------------------------------------------
>
> Key: SPARK-20107
> URL: https://issues.apache.org/jira/browse/SPARK-20107
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Yuming Wang
>
> It can speed up {{11 minutes}} for 216869 output files.
> This improvement can effect all cloudera's hadoop cdh5-2.6.0_5.4.0 higher
> versions,(see:
> https://github.com/cloudera/hadoop-common/commit/1c1236182304d4075276c00c4592358f428bc433
> and
> https://github.com/cloudera/hadoop-common/commit/16b2de27321db7ce2395c08baccfdec5562017f0)
> and apache's hadoop 2.7.0 higher versions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]