[ https://issues.apache.org/jira/browse/MAPREDUCE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang reassigned MAPREDUCE-7183: ------------------------------------------ Assignee: Mikayla Konst > Make app master recover history from latest history file that exists > -------------------------------------------------------------------- > > Key: MAPREDUCE-7183 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7183 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster > Reporter: Mikayla Konst > Assignee: Mikayla Konst > Priority: Major > Attachments: MAPREDUCE-7183.patch > > > When running a mapreduce job, when the original app master is killed, the new > app master normally attempts to recover by reading the jhist file that was > written by the app master from the previous app attempt (e.g. current app > attempt - 1). > This is usually fine, but is a problem in the following situation: > # App master 1 writes history to jobid_1.jhist, then is killed > # App master 2 starts up but is killed before it has the chance to write any > history to jobid_2.jhist > # App master 3 attempts to recover, but it can't find jobid_2.jhist, so all > job progress is lost. > This problem manifests as "Unable to parse prior job history, aborting > recovery" and "Could not parse the old history file. Will not have old > AMinfos" errors, all job progress being lost, and previous app attempts not > showing up in the job history UI. > To fix this problem, if jobid_2.jhist is missing, app master 3 should just > recover using the history in jobid_1.jhist. > Related JIRAs that mention this same problem: > https://issues.apache.org/jira/browse/MAPREDUCE-4729 > https://issues.apache.org/jira/browse/MAPREDUCE-4767 -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org