[
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357043#comment-14357043
]
Hudson commented on MAPREDUCE-4815:
-----------------------------------
FAILURE: Integrated in Hadoop-Mapreduce-trunk #2079 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2079/])
MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files.
(Siqi Li via gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
* hadoop-mapreduce-project/CHANGES.txt
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
> Speed up FileOutputCommitter#commitJob for many output files
> ------------------------------------------------------------
>
> Key: MAPREDUCE-4815
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mrv2
> Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
> Reporter: Jason Lowe
> Assignee: Siqi Li
> Labels: perfomance
> Fix For: 2.7.0
>
> Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch,
> MAPREDUCE-4815.v12.patch, MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch,
> MAPREDUCE-4815.v15.patch, MAPREDUCE-4815.v16.patch, MAPREDUCE-4815.v17.patch,
> MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch,
> MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch,
> MAPREDUCE-4815.v9.patch
>
>
> If a job generates many files to commit then the commitJob method call at the
> end of the job can take minutes. This is a performance regression from 1.x,
> as 1.x had the tasks commit directly to the final output directory as they
> were completing and commitJob had very little to do. The commit work was
> processed in parallel and overlapped the processing of outstanding tasks. In
> 0.23/2.x, the commit is single-threaded and waits until all tasks have
> completed before commencing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)