[
https://issues.apache.org/jira/browse/MAPREDUCE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alejandro Abdelnur resolved MAPREDUCE-4109.
-------------------------------------------
Resolution: Invalid
after looking at the code my assumptions proven incorrect, it is not possible
for such scenario.
What may be happening is MAPREDUCE-3972.
> availability of a job info in HS should be atomic
> -------------------------------------------------
>
> Key: MAPREDUCE-4109
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4109
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster, jobhistoryserver, mrv2
> Affects Versions: 2.0.0
> Reporter: Alejandro Abdelnur
> Priority: Blocker
> Fix For: 2.0.0
>
>
> It seems that the HS starts serving info about a job before it has all the
> info available.
> In the trace below, a RunningJob throws a NPE when trying to access the
> counters.
> This is happening on & off, thus I assume it is related to either the AM not
> flushing all job info to HDFS before notifying HS or the HS not loading all
> the job info from HDFS before start serving it.
> In case it helps to diagnose the issue, this is happening in a secure cluster.
> This makes Oozie to mark jobs as failed.
> {code}
> java.lang.NullPointerException
> at
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getCounters(HistoryClientService.java:214)
> at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:149)
> at
> org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:206)
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654)
> at LocalTrace:
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:163)
> at $Proxy31.getCounters(Unknown Source)
> at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getCounters(MRClientProtocolPBClientImpl.java:162)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
> at
> org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:325)
> at
> org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:472)
> at org.apache.hadoop.mapreduce.Job$8.run(Job.java:714)
> at org.apache.hadoop.mapreduce.Job$8.run(Job.java:711)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:711)
> at
> org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396)
> at
> org.apache.oozie.action.hadoop.LauncherMapper.hasIdSwap(LauncherMapper.java:296)
> at
> org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:886)
> at
> org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:162)
> at
> org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:51)
> at org.apache.oozie.command.XCommand.call(XCommand.java:260)
> at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:679)
> {code}
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira