[ https://issues.apache.org/jira/browse/SPARK-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin resolved SPARK-15754. ------------------------------------ Resolution: Fixed Assignee: Subroto Sanyal Fix Version/s: 2.0.0 1.6.2 > org.apache.spark.deploy.yarn.Client changes the credential of current user > -------------------------------------------------------------------------- > > Key: SPARK-15754 > URL: https://issues.apache.org/jira/browse/SPARK-15754 > Project: Spark > Issue Type: Bug > Affects Versions: 1.6.1 > Environment: Spark Client with Secured Hadoop Cluster > Reporter: Subroto Sanyal > Assignee: Subroto Sanyal > Priority: Critical > Fix For: 1.6.2, 2.0.0 > > > h5. Problem > Spawning of SparkContext in Spark-Client mode changes the credentials of > current user group information. This doesn't let the client (who spawned > Spark-Context) talk to the Name Node using tgt anymore but, using delegation > tokens. This is undesirable for any library to change the context of JVM here > _UserGroupInformation_ > h5. Root Cause > Spark creates HDFS Delegation Tokens so that the App master so spawned can > communicate with Name Node but, during creation of this token Spark adds the > delegation token to current users credentials as well. > {code:title=org.apache.spark.deploy.yarn.Client.java#createContainerLaunchContext|borderStyle=solid} > setupSecurityToken(amContainer) > UserGroupInformation.getCurrentUser().addCredentials(credentials) > amContainer{code} > With this operation client now always uses delegation token for any further > communication with Name Node. This scenario becomes dangerous when Resource > Manager cancels the Delegation Token after 10 minutes of shutting down the > spark context. This leads to issues on client side like: > {noformat}org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 444 for subroto) can't be found in cache > at org.apache.hadoop.ipc.Client.call(Client.java:1472) > at org.apache.hadoop.ipc.Client.call(Client.java:1403) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2095) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214) > at > org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1409) > at Sample.main(Sample.java:85){noformat} > There are other places in code also where we do similar operation like in: > _org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater.updateCredentialsIfRequired()_ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org