Sanchay Javeria created SPARK-31514: ---------------------------------------
Summary: Kerberos: Spark UGI credentials are not getting passed down to Hive Key: SPARK-31514 URL: https://issues.apache.org/jira/browse/SPARK-31514 Project: Spark Issue Type: Question Components: SQL Affects Versions: 2.4.4 Reporter: Sanchay Javeria I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to run a query via the {{spark-sql}} shell. The simplified setup basically looks like this: spark-sql shell running on one host in a Yarn cluster -> external hive-metastore running one host -> S3 to store table data. When I launch the {{spark-sql}} shell with DEBUG logging enabled, this is what I see in the logs: {code:java} > bin/spark-sql --proxy-user proxy_user ... DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for proxy_user against hive/_h...@realm.com at thrift://hive-metastore:9083 DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_h...@realm.com (auth:KERBEROS) from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130){code} This means that Spark made a call to fetch the delegation token from the Hive metastore and then added it to the list of credentials for the UGI. [This is the piece of code|https://github.com/apache/spark/blob/branch-2.4/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L129] that does that. I also verified in the metastore logs that the {{get_delegation_token()}} call was being made. Now when I run a simple query like {{create table test_table (id int) location "s3://some/prefix";}} I get hit with an AWS credentials error. I modified the hive metastore code and added this right before the file system in Hadoop is initialized ([org/apache/hadoop/hive/metastore/Warehouse.java|#L116]]): {code:java} public static FileSystem getFs(Path f, Configuration conf) throws MetaException { try { UserGroupInformation ugi = UserGroupInformation.getCurrentUser(); LOG.info("UGI information: " + ugi); Collection<Token<? extends TokenIdentifier>> tokens = ugi.getCredentials().getAllTokens(); for(Token token : tokens) { LOG.info(token); } } catch (IOException e) { e.printStackTrace(); } ... {code} In the metastore logs, this does print the correct UGI information: {code:java} UGI information: proxy_user (auth:PROXY) via hive/hive-metast...@realm.com (auth:KERBEROS){code} but there are no tokens present in the UGI. Looks like [Spark code|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala#L101] adds it with the alias {{hive.server2.delegation.token}} but I don't see it in the UGI. This makes me suspect that somehow the UGI scope is isolated and not being shared between spark-sql and hive metastore. How do I go about solving this? Any help will be really appreciated! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org