[ https://issues.apache.org/jira/browse/SPARK-31514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-31514. ---------------------------------- Resolution: Invalid > Kerberos: Spark UGI credentials are not getting passed down to Hive > ------------------------------------------------------------------- > > Key: SPARK-31514 > URL: https://issues.apache.org/jira/browse/SPARK-31514 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 2.4.4 > Reporter: Sanchay Javeria > Priority: Major > > I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to > run a query via the {{spark-sql}} shell. > The simplified setup basically looks like this: spark-sql shell running on > one host in a Yarn cluster -> external hive-metastore running one host -> S3 > to store table data. > When I launch the {{spark-sql}} shell with DEBUG logging enabled, this is > what I see in the logs: > {code:java} > > bin/spark-sql --proxy-user proxy_user > ... > DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for > proxy_user against hive/_h...@realm.com at thrift://hive-metastore:9083 > DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_h...@realm.com > (auth:KERBEROS) > from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130){code} > This means that Spark made a call to fetch the delegation token from the Hive > metastore and then added it to the list of credentials for the UGI. [This is > the piece of > code|https://github.com/apache/spark/blob/branch-2.4/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L129] > that does that. I also verified in the metastore logs that the > {{get_delegation_token()}} call was being made. > Now when I run a simple query like {{create table test_table (id int) > location "s3://some/prefix";}} I get hit with an AWS credentials error. I > modified the hive metastore code and added this right before the file system > in Hadoop is initialized > ([org/apache/hadoop/hive/metastore/Warehouse.java|#L116]): > {code:java} > public static FileSystem getFs(Path f, Configuration conf) throws > MetaException { > try { > UserGroupInformation ugi = UserGroupInformation.getCurrentUser(); > LOG.info("UGI information: " + ugi); > Collection<Token<? extends TokenIdentifier>> tokens = > ugi.getCredentials().getAllTokens(); > for(Token token : tokens) { > LOG.info(token); > } > } catch (IOException e) { > e.printStackTrace(); > } > ... > {code} > In the metastore logs, this does print the correct UGI information: > {code:java} > UGI information: proxy_user (auth:PROXY) via hive/hive-metast...@realm.com > (auth:KERBEROS){code} > but there are no tokens present in the UGI. Looks like [Spark > code|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala#L101] > adds it with the alias {{hive.server2.delegation.token}} but I don't see it > in the UGI. This makes me suspect that somehow the UGI scope is isolated and > not being shared between spark-sql and hive metastore. How do I go about > solving this? Any help will be really appreciated! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org