[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508433#comment-13508433
 ] 

Devaraj K commented on MAPREDUCE-4841:
--------------------------------------

Sorry for the late response. Here AM is not crasing. 

As Jason mentioned in the above comment, AM Attempt is failing to unregister 
with RM due to restart of the NM where the AM launched and before unregister 
first AM attempt is removing the staging directory. For the next AM attempts, 
all are failing with FNFE due to staging files not present.
                
> Application Master Retries fail due to FileNotFoundException
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-4841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4841
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.0.1-alpha
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>            Priority: Blocker
>
> Application attempt1 is deleting the job related files and these are not 
> present in the HDFS for following retries.
> {code:xml}
> Application application_1353724754961_0001 failed 4 times due to AM Container 
> for appattempt_1353724754961_0001_000004 exited with exitCode: -1000 due to: 
> RemoteTrace: java.io.FileNotFoundException: File does not exist: 
> hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:752)
>  at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:88) at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at 
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at 
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>  at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at 
> org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:138) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:138) at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662) at LocalTrace: 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: File 
> does not exist: 
> hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:822)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:492)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:221)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>  at 
> org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924) at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) .Failing this 
> attempt.. Failing the application. 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to