[ https://issues.apache.org/jira/browse/MAPREDUCE-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196502#comment-13196502 ]
Robert Joseph Evans commented on MAPREDUCE-3711: ------------------------------------------------ OK I have mapped out the FileOutputCommitter directory structure and I have read through as much of the code as I can. {noformat} <outputPath>/_temporary/<appAttemptID (int)>/_temporary/_<taskAttemptID (string)>/ |-----JobAttemptBaseDirName-----| |------------------tmpDir------------------| |-----------------------taskAttemptBaseDirName-----------------------| |-------------------------------------workDir------------------------------------| {noformat} setupJob() creates the tmpDir directory. commitJob() deletes the tmpDir directory, moves everything else in JobAttemptBaseDirName to outputPath and then deletes _temporary under outputPath. cleanupJob() and abortJob() just delete _temporary under outputPath. setupTask() is a noop. commitTask() moves everything under workDir to JobAttemptBaseDirName. abortTask() deletes workdir. recoverTask() moves everything under JobAttemptBaseDir - 1 to JobAttemptBaseDir. The problem is that we cannot just recover a single task with the current directory structure. The FileOutputFormat API allows for a user to put anything they want into workDir. The onus is on the user to be sure that the output of one task will not collide with the output from another task. We provide some APIs to make this simple, and if it is just a normal mapper or reducer output then we handle that internally, but if it does collide we happily delete the first output file, and move in the new one to replace it. This makes it impossible to recover the first completed task, without recovering the second one too. There are two possible ways to fix recoverTask that I see. The first one is to add in a recoverJob API in addition to recoverTask. In the case of FileOutputFormat recoverJob would be implemented to do what recoverTask does now, except it would also delete the _temporary directory under the JobAttemptBaseDirName. recoverTask would then become a noop for FileOutputFormat. The second option is to completely rewrite the way that FileOutputFormat stores intermediate results. We would keep the output form each task separate until the Job is committed. That way would could recover each task one at a time. I am fine with either way. As Vinod has also asked me to clean up the code redoing the directory layout too is not that big of a deal. However I am leaning towards adding in recoverJob as it seems like it is a good API to have in OutputFormat to begin with, and it is the smallest change to make this work. If someone feels otherwise please post a comment here. In the meantime I will try to get a patch up that adds in recoverJob. > AppMaster recovery for Medium to large jobs take long time > ---------------------------------------------------------- > > Key: MAPREDUCE-3711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 0.23.0 > Reporter: Siddharth Seth > Assignee: Robert Joseph Evans > Priority: Blocker > > Reported by [~karams] > yarn.resourcemanager.am.max-retries=2 > Ran test cases with sort job on 350 scale having 16800 maps and 680 reduces -: > 1. After 70 secs of Job Sumbission Am is killed using kill -9, around 3900 > maps were completed and 680 reduces were > scheduled, Second AM got restart. Job got completed in 980 secs. AM took very > less time to recover. > 2. After 150 secs of Job Sumbission AM is killed using kill -9, around 90% > maps were completed and 680 reduces were > scheduled , Second AM got restart Job got completed in 1000 secs. AM got > revocer. > 3. After 150 secs of Job Sumbission AM as killed using kill -9, almost all > maps were completed and only 680 reduces > were running, Recovery was too slow, AM was still revocering after 1hr :40 > mis when I killed the run. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira