[ https://issues.apache.org/jira/browse/MAPREDUCE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alejandro Abdelnur resolved MAPREDUCE-4109. ------------------------------------------- Resolution: Invalid after looking at the code my assumptions proven incorrect, it is not possible for such scenario. What may be happening is MAPREDUCE-3972. > availability of a job info in HS should be atomic > ------------------------------------------------- > > Key: MAPREDUCE-4109 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4109 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, jobhistoryserver, mrv2 > Affects Versions: 2.0.0 > Reporter: Alejandro Abdelnur > Priority: Blocker > Fix For: 2.0.0 > > > It seems that the HS starts serving info about a job before it has all the > info available. > In the trace below, a RunningJob throws a NPE when trying to access the > counters. > This is happening on & off, thus I assume it is related to either the AM not > flushing all job info to HDFS before notifying HS or the HS not loading all > the job info from HDFS before start serving it. > In case it helps to diagnose the issue, this is happening in a secure cluster. > This makes Oozie to mark jobs as failed. > {code} > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getCounters(HistoryClientService.java:214) > at > org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:149) > at > org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:206) > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654) > at LocalTrace: > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:163) > at $Proxy31.getCounters(Unknown Source) > at > org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getCounters(MRClientProtocolPBClientImpl.java:162) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296) > at > org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:325) > at > org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:472) > at org.apache.hadoop.mapreduce.Job$8.run(Job.java:714) > at org.apache.hadoop.mapreduce.Job$8.run(Job.java:711) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:711) > at > org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396) > at > org.apache.oozie.action.hadoop.LauncherMapper.hasIdSwap(LauncherMapper.java:296) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:886) > at > org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:162) > at > org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:51) > at org.apache.oozie.command.XCommand.call(XCommand.java:260) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira