[
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated MAPREDUCE-5641:
-------------------------------------
Attachment: MAPREDUCE-5641.patch
I’ve attached a preliminary version of the patch. Once we all agree on the
specifics of the design, I can add unit tests.
The patch follows the design I outlined before where the RM will write a file
when it sees an AM die and the JHS see that and copies the jhist and similar
files to the done_intermediate dir. I have tested this by running jobs and
killing the AM. This results in incomplete information, as expected; however,
in some cases some of the information won’t make 100% sense or is missing (e.g.
no Finish Time if the AM didn’t actually finish). I’ve put in some code to
take care of these situations. I’ve also attached a preliminary YARN patch to
YARN-1731.
{quote}
How will the JHS copy the file to the intermediate directory? It likely won't
have access to the staging directory containing the jhist file.
{quote}
I modified the permissions from 0700 to 0701.
> History for failed Application Masters should be made available to the Job
> History Server
> -----------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: applicationmaster, jobhistoryserver
> Affects Versions: 2.2.0
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed. This
> is because the History is written by the AM to the intermediate folder just
> before finishing, so when it fails for any reason, this information isn't
> copied there. However, it is not lost as its in the AM's staging directory.
> To make the History available in the JHS, all we need to do is have another
> mechanism to move the History from the staging directory to the intermediate
> directory. The AM also writes a "Summary" file before exiting normally,
> which is also unavailable when the AM fails.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)