[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772301#comment-13772301
 ] 

Bikas Saha commented on MAPREDUCE-5505:
---------------------------------------

Are we sure that previous state is always RUNNING before FAILED?
{code}
+      case FAILED:
+        if (isUnregistered) {
+          return JobState.FAILED;
+        } else {
+          return JobState.RUNNING;
{code}

Instead of isUnregistered, let us create an AtomicBoolean called 
safeToReportTerminationToUser. Instead of JobImpl, this boolean can be made 
visible via the AppContext object so that everyone has access to it. When to 
set the boolean to true? We could do it in RMCommunicator after unregister 
succeeds (like in this patch). Or we can do it in 
MRClientService.serviceStop(). Since MRClientService is the last service to 
stop() we can be sure that everything finished nicely. 
MRClientService.serviceStop() can set the boolean. Then we can move the 
sleep(5sec) from MRAppMaster to MRClientService.serviceStop() after setting the 
boolean. 
We should leave a comment explaining this in MRAppMaster.shutdown() before the 
call to clientService.stop() so that its easy for someone else to track this 
logic.

Please do run single node tests to verify the behavior for real along with RM 
restart.
                
> Clients should be notified job finished only after job successfully 
> unregistered 
> ---------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5505
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jian He
>            Assignee: Zhijie Shen
>         Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch
>
>
> This is to make sure user is notified job finished after job is really done. 
> This does increase client latency but can reduce some races during unregister 
> like YARN-540

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to