Jason Lowe created MAPREDUCE-4815:
-------------------------------------
Summary: FileOutputCommitter.commitJob can be very slow for jobs
with many output files
Key: MAPREDUCE-4815
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 2.0.1-alpha, 0.23.3
Reporter: Jason Lowe
If a job generates many files to commit then the commitJob method call at the
end of the job can take minutes. This is a performance regression from 1.x, as
1.x had the tasks commit directly to the final output directory as they were
completing and commitJob had very little to do. The commit work was processed
in parallel and overlapped the processing of outstanding tasks. In 0.23/2.x,
the commit is single-threaded and waits until all tasks have completed before
commencing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira