[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505260#comment-13505260
 ] 

Arun C Murthy commented on MAPREDUCE-4815:
------------------------------------------

bq. Write permissions to the parent directory of the output directory is a new 
implicit requirement over the original FileOutputFormat. I think in the vast 
majority of cases it won't be a problem, but it is a potential 
backwards-compatibility issue.

Currently that is already required since FileOutputFormat creates the output 
dir in the parent dir itself, so that isn't a new requirement.

bq.  I think we should add this as an optimized path to FileOutputFormat, but 
keep the original, iterative rename scheme if the output directory isn't empty 
for backwards compatibility.

Makes sense. It's unfortunately much more code to maintain, and I'm not sure 
it's worth it, but a good idea nevertheless.

I have a preliminary patch which I'm testing, I'll upload it asap. 
                
> FileOutputCommitter.commitJob can be very slow for jobs with many output files
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4815
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Assignee: Arun C Murthy
>
> If a job generates many files to commit then the commitJob method call at the 
> end of the job can take minutes.  This is a performance regression from 1.x, 
> as 1.x had the tasks commit directly to the final output directory as they 
> were completing and commitJob had very little to do.  The commit work was 
> processed in parallel and overlapped the processing of outstanding tasks.  In 
> 0.23/2.x, the commit is single-threaded and waits until all tasks have 
> completed before commencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to