Sanchay Javeria created SPARK-31514:
---------------------------------------

             Summary: Kerberos: Spark UGI credentials are not getting passed 
down to Hive
                 Key: SPARK-31514
                 URL: https://issues.apache.org/jira/browse/SPARK-31514
             Project: Spark
          Issue Type: Question
          Components: SQL
    Affects Versions: 2.4.4
            Reporter: Sanchay Javeria


I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to run 
a query via the {{spark-sql}} shell.

The simplified setup basically looks like this: spark-sql shell running on one 
host in a Yarn cluster -> external hive-metastore running one host -> S3 to 
store table data.

When I launch the {{spark-sql}} shell with DEBUG logging enabled, this is what 
I see in the logs:
{code:java}
> bin/spark-sql --proxy-user proxy_user 

...
DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for proxy_user 
against hive/_h...@realm.com at thrift://hive-metastore:9083 

DEBUG UserGroupInformation: PrivilegedAction as:spark/spark_h...@realm.com 
(auth:KERBEROS) 
from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130){code}
This means that Spark made a call to fetch the delegation token from the Hive 
metastore and then added it to the list of credentials for the UGI. [This is 
the piece of 
code|https://github.com/apache/spark/blob/branch-2.4/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L129]
 that does that. I also verified in the metastore logs that the 
{{get_delegation_token()}} call was being made.

Now when I run a simple query like {{create table test_table (id int) location 
"s3://some/prefix";}} I get hit with an AWS credentials error. I modified the 
hive metastore code and added this right before the file system in Hadoop is 
initialized ([org/apache/hadoop/hive/metastore/Warehouse.java|#L116]]):
{code:java}
 public static FileSystem getFs(Path f, Configuration conf) throws 
MetaException {
    try {
      UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
      LOG.info("UGI information: " + ugi);
      Collection<Token<? extends TokenIdentifier>> tokens = 
ugi.getCredentials().getAllTokens();
      for(Token token : tokens) {
        LOG.info(token);
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
...
{code}
In the metastore logs, this does print the correct UGI information:
{code:java}
UGI information: proxy_user (auth:PROXY) via hive/hive-metast...@realm.com 
(auth:KERBEROS){code}
but there are no tokens present in the UGI. Looks like [Spark 
code|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala#L101]
 adds it with the alias {{hive.server2.delegation.token}} but I don't see it in 
the UGI. This makes me suspect that somehow the UGI scope is isolated and not 
being shared between spark-sql and hive metastore. How do I go about solving 
this? Any help will be really appreciated!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to