[jira] [Created] (MAPREDUCE-7160) Job end notification not sent and client service not stopped after YarnRuntimeException in shutDownJob in MRAppMaster.java

John Thomason (JIRA) Tue, 20 Nov 2018 16:15:29 -0800

John Thomason created MAPREDUCE-7160:
----------------------------------------


             Summary: Job end notification not sent and client service not 
stopped after YarnRuntimeException in shutDownJob in MRAppMaster.java
                 Key: MAPREDUCE-7160
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7160
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster
    Affects Versions: 3.0.3
            Reporter: John Thomason


If a YarnRuntimeException occurs during shutDownJob (in this case, at the line 
'MRAppMaster.this.stop();') in MRAppMaster.java, 
{code:java}
2018-11-20 12:38:46,173 INFO [Thread-171] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop, writing 
event TASK_FINISHED 
2018-11-20 12:38:46,173 ERROR [Thread-171] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing 
History Event: 
org.apache.hadoop.mapreduce.jobhistory.TaskFinishedEvent@3b1a7bb5 
java.io.IOException: All datanodes 
[DatanodeInfoWithStorage[10.11.1.227:9866,DS-3539e7b8-5d87-45e5-880a-1897f11577d2,DISK]]
 are bad. Aborting... 
at 
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1561) 
at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1495)
 at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
 at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1256)
 at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:667) 2018-11-20 
12:38:46,174 INFO [Thread-171] org.apache.hadoop.service.AbstractService: 
Service JobHistoryEventHandler failed in state STOPPED

{code}
then the line 
{code:java}
catch (Throwable t) { 
LOG.warn("Graceful stop failed. Exiting.. ", t); 
exitMRAppMaster(1, t); 
}

{code}
causes the MRAppMaster to exit before the finally block can execute. This means 
that the job end notification is not sent despite 
[https://jira.apache.org/jira/browse/MAPREDUCE-6895?attachmentOrder=desc]. 

Additionally, this finally block should also call 'clientService.stop();', as 
otherwise the following errors occur on the client side:
{code:java}
2018-11-07 19:50:59,602 INFO mapreduce.Job: map 100% reduce 98% 
2018-11-07 19:51:03,617 INFO mapreduce.Job: map 100% reduce 99% 
2018-11-07 19:51:07,648 INFO mapreduce.Job: map 100% reduce 100% 
2018-11-07 19:51:07,656 INFO mapreduce.Job: Job job_1541647829228_0001 
completed successfully 
2018-11-07 19:51:08,162 INFO mapred.ClientServiceDelegate: Application state is 
completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
2018-11-07 19:51:09,169 INFO ipc.Client: Retrying connect to server: 
0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 
2018-11-07 19:51:10,170 INFO ipc.Client: Retrying connect to server: 
0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 
2018-11-07 19:51:11,171 INFO ipc.Client: Retrying connect to server: 
0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
......
2018-11-07 19:51:38,407 INFO ipc.Client: Retrying connect to server: 
0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 
java.io.IOException: java.net.ConnectException: Your endpoint configuration is 
wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort 
at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:344)
 at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:382)
 at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:859) at 
org.apache.hadoop.mapreduce.Job$8.run(Job.java:820) at 
org.apache.hadoop.mapreduce.Job$8.run(Job.java:817) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
 at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:817) at 
org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1663) at 
org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591) at 
org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:334) at 
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:343) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at 
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.util.RunJar.run(RunJar.java:244) at 
org.apache.hadoop.util.RunJar.main(RunJar.java:158)
{code}
 

Adding calls to send the job end notification and stop the client service prior 
to ' exitMRAppMaster' within the catch block fixes this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (MAPREDUCE-7160) Job end notification not sent and client service not stopped after YarnRuntimeException in shutDownJob in MRAppMaster.java

Reply via email to