[ 
https://issues.apache.org/jira/browse/SPARK-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-15754.
------------------------------------
       Resolution: Fixed
         Assignee: Subroto Sanyal
    Fix Version/s: 2.0.0
                   1.6.2

> org.apache.spark.deploy.yarn.Client changes the credential of current user
> --------------------------------------------------------------------------
>
>                 Key: SPARK-15754
>                 URL: https://issues.apache.org/jira/browse/SPARK-15754
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1
>         Environment: Spark Client with Secured Hadoop Cluster
>            Reporter: Subroto Sanyal
>            Assignee: Subroto Sanyal
>            Priority: Critical
>             Fix For: 1.6.2, 2.0.0
>
>
> h5. Problem
> Spawning of SparkContext in Spark-Client mode changes the credentials of 
> current user group information. This doesn't let the client (who spawned 
> Spark-Context) talk to the Name Node using tgt anymore but, using delegation 
> tokens. This is undesirable for any library to change the context of JVM here 
> _UserGroupInformation_
> h5. Root Cause
> Spark creates HDFS Delegation Tokens so that the App master so spawned can 
> communicate with Name Node but, during creation of this token Spark adds the 
> delegation token to current users credentials as well.
> {code:title=org.apache.spark.deploy.yarn.Client.java#createContainerLaunchContext|borderStyle=solid}
>     setupSecurityToken(amContainer)
>     UserGroupInformation.getCurrentUser().addCredentials(credentials)
>     amContainer{code}
> With this operation client now always uses delegation token for any further 
> communication with Name Node. This scenario becomes dangerous when Resource 
> Manager cancels the Delegation Token after 10 minutes of shutting down the 
> spark context. This leads to issues on client side like:
> {noformat}org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 444 for subroto) can't be found in cache
>       at org.apache.hadoop.ipc.Client.call(Client.java:1472)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>       at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>       at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
>       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2095)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
>       at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1409)
>       at Sample.main(Sample.java:85){noformat}
> There are other places in code also where we do similar operation like in:
> _org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired()_



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to