[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699773#comment-15699773
 ] 

Prabhu Joseph commented on MAPREDUCE-6711:
------------------------------------------

Hi [~djp], Sorry missed the mail. Job is a Map only and has a single Map task. 
Once Map Attempt and Task is SUCCEEDED, the job transitioned from RUNNING to 
COMMITTING state. At this point, if the Succeeded Attempt is Killed as part of 
Container Preemption, a T_ATTEMPT_KILLED is raised and the task transitioned 
from Succeeded to Scheduled, TaskImpl#RetroactiveKilledTransition tells the job 
about the rescheduling by raising both JOB_TASK_ATTEMPT_COMPLETED and 
JOB_MAP_TASK_RESCHEDULED. The job now will receive both these events at 
COMMITTING state and fails as the transition is not handled.

Looks like the fix can ignore the JOB_TASK_ATTEMPT_COMPLETED but not 
JOB_MAP_TASK_RESCHEDULED instead move the COMMITTING job to RUNNING state again 
and reschedule the Map Task like below. 

addTransition(JobStateInternal.COMMITTING, JobStateInternal.RUNNING, 
JobEventType.JOB_MAP_TASK_RESCHEDULED, new MapTaskRescheduledTransition())

Please share your comments.

> JobImpl fails to handle preemption events on state COMMITTING
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-6711
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6711
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Li Lu
>            Assignee: Prabhu Joseph
>         Attachments: MAPREDUCE-6711.1.patch, MAPREDUCE-6711.patch
>
>
> When a MR app being preempted on COMMITTING state, we saw the following 
> exceptions in its log:
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_TASK_ATTEMPT_COMPLETED at COMMITTING
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>         at java.lang.Thread.run(Thread.java:744)
> {code}
> and 
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_MAP_TASK_RESCHEDULED at COMMITTING
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
>         at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
>         at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>         at java.lang.Thread.run(Thread.java:744)
> {code}
> Seems like we need to handle those preemption related events when the job is 
> being committed? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to