[
https://issues.apache.org/jira/browse/MAPREDUCE-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Junping Du updated MAPREDUCE-6263:
----------------------------------
Summary: Configurable timeout between YARNRunner terminate the application
and forcefully kill. (was: Large jobs can lose history when killed due to
brief client timeout)
> Configurable timeout between YARNRunner terminate the application and
> forcefully kill.
> --------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6263
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6263
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: client
> Affects Versions: 2.6.0
> Reporter: Jason Lowe
> Assignee: Eric Payne
> Attachments: MAPREDUCE-6263.v1.txt, MAPREDUCE-6263.v2.txt
>
>
> YARNRunner connects to the AM to send the kill job command then waits a
> hardcoded 10 seconds for the job to enter a terminal state. If the job fails
> to enter a terminal state in that time then YARNRunner will tell YARN to kill
> the application forcefully. The latter type of kill usually results in no
> job history, since the AM process is killed forcefully.
> Ten seconds can be too short for large jobs in a large cluster, as it takes
> time to connect to all the nodemanagers, process the state machine events,
> and copy a large jhist file. The timeout should be more lenient or
> configurable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)