[
https://issues.apache.org/jira/browse/IGNITE-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Puviarasu updated IGNITE-8890:
------------------------------
Labels: Ignite kerberos yarn (was: )
> Ignite YARN Kerberos - Delegation Token renewal
> -----------------------------------------------
>
> Key: IGNITE-8890
> URL: https://issues.apache.org/jira/browse/IGNITE-8890
> Project: Ignite
> Issue Type: Bug
> Components: yarn
> Affects Versions: 2.3
> Environment: Kerberos cluster
> Ignite Version : 2.3.0
> Module : Ignite-YARN
> Class : ApplicationMaster
>
> Reporter: Puviarasu
> Priority: Blocker
> Labels: Ignite, kerberos, yarn
>
> As Ignite-YARN is a long running application in YARN environment it should
> have a mechanism to renew the delegation token.
> In Ignite-YARN, when the ApplicationMaster is started, it acquires Delegation
> tokens and stores in a ByteBuffer[Class: ApplicationMaster, Method: init()].
> This ByteBuffer with token information is given to all the containers
> received from ResourceManager [Class: ApplicationMaster, Method:
> onContainersAllocated()].
> Everything works fine till the life time of the delegation token.
> Once the delegation token expires, the ApplicationMaster is not able to start
> Ignite inside containers it receive and below exception occurs
> *WARNING: Error launching container*
>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager*$InvalidToken*)
> :
> at org.apache.hadoop.ipc.Client.call(Client.java:1504)
> at org.apache.hadoop.ipc.Client.call(Client.java:1441)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2123)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1253)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1249)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1249)
> at
> org.apache.ignite.yarn.utils.IgniteYarnUtils.setupFile(IgniteYarnUtils.java:65)
> at
> org.apache.ignite.yarn.ApplicationMaster.onContainersAllocated(ApplicationMaster.java:131)
> at
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:292)
> ApplicationMaster keeps on asking for more and more containers [Class:
> ApplicationMaster, Method: run()] but not able to start Ignite inside any of
> the containers due to the expired/missing delegation token. The failed
> containers are not released when Exception occurs.
> *This repeats until all the resources in the cluster are allocated to
> Ignition. As a result of this Ignition uses all resources in the cluster and
> no other jobs were able to run.*
> Kindly help in resolving the issue.
> Thanks in Advance!!!
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)