[
https://issues.apache.org/jira/browse/MAPREDUCE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikayla Konst updated MAPREDUCE-7183:
-------------------------------------
Attachment: MAPREDUCE-7183.patch
Status: Patch Available (was: Open)
> Make app master recover history from latest history file that exists
> --------------------------------------------------------------------
>
> Key: MAPREDUCE-7183
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7183
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster
> Reporter: Mikayla Konst
> Priority: Major
> Attachments: MAPREDUCE-7183.patch
>
>
> When running a mapreduce job, when the original app master is killed, the new
> app master normally attempts to recover by reading the jhist file that was
> written by the app master from the previous app attempt (e.g. current app
> attempt - 1).
> This is usually fine, but is a problem in the following situation:
> # App master 1 writes history to jobid_1.jhist, then is killed
> # App master 2 starts up but is killed before it has the chance to write any
> history to jobid_2.jhist
> # App master 3 attempts to recover, but it can't find jobid_2.jhist, so all
> job progress is lost.
> This problem manifests as "Unable to parse prior job history, aborting
> recovery" and "Could not parse the old history file. Will not have old
> AMinfos" errors, all job progress being lost, and previous app attempts not
> showing up in the job history UI.
> To fix this problem, if jobid_2.jhist is missing, app master 3 should just
> recover using the history in jobid_1.jhist.
> Related JIRAs that mention this same problem:
> https://issues.apache.org/jira/browse/MAPREDUCE-4729
> https://issues.apache.org/jira/browse/MAPREDUCE-4767
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]