[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-4109.
-------------------------------------------

    Resolution: Invalid

after looking at the code my assumptions proven incorrect, it is not possible 
for such scenario.

What may be happening is MAPREDUCE-3972.
                
> availability of a job info in HS should be atomic
> -------------------------------------------------
>
>                 Key: MAPREDUCE-4109
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4109
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, jobhistoryserver, mrv2
>    Affects Versions: 2.0.0
>            Reporter: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> It seems that the HS starts serving info about a job before it has all the 
> info available.
> In the trace below, a RunningJob throws a NPE when trying to access the 
> counters.
> This is happening on & off, thus I assume it is related to either the AM not 
> flushing all job info to HDFS before notifying HS or the HS not loading all 
> the job info from HDFS before start serving it.
> In case it helps to diagnose the issue, this is happening in a secure cluster.
> This makes Oozie to mark jobs as failed.
> {code}
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getCounters(HistoryClientService.java:214)
>       at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:149)
>       at 
> org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:206)
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654)
>  at LocalTrace: 
>       org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:163)
>       at $Proxy31.getCounters(Unknown Source)
>       at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getCounters(MRClientProtocolPBClientImpl.java:162)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:616)
>       at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296)
>       at 
> org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:325)
>       at 
> org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:472)
>       at org.apache.hadoop.mapreduce.Job$8.run(Job.java:714)
>       at org.apache.hadoop.mapreduce.Job$8.run(Job.java:711)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:416)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>       at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:711)
>       at 
> org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396)
>       at 
> org.apache.oozie.action.hadoop.LauncherMapper.hasIdSwap(LauncherMapper.java:296)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:886)
>       at 
> org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:162)
>       at 
> org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:51)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:260)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:166)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:679)
> {code}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to