[
https://issues.apache.org/jira/browse/SPARK-31514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-31514.
----------------------------------
Resolution: Invalid
> Kerberos: Spark UGI credentials are not getting passed down to Hive
> -------------------------------------------------------------------
>
> Key: SPARK-31514
> URL: https://issues.apache.org/jira/browse/SPARK-31514
> Project: Spark
> Issue Type: Question
> Components: SQL
> Affects Versions: 2.4.4
> Reporter: Sanchay Javeria
> Priority: Major
>
> I'm using Spark-2.4, I have a Kerberos enabled cluster where I'm trying to
> run a query via the {{spark-sql}} shell.
> The simplified setup basically looks like this: spark-sql shell running on
> one host in a Yarn cluster -> external hive-metastore running one host -> S3
> to store table data.
> When I launch the {{spark-sql}} shell with DEBUG logging enabled, this is
> what I see in the logs:
> {code:java}
> > bin/spark-sql --proxy-user proxy_user
> ...
> DEBUG HiveDelegationTokenProvider: Getting Hive delegation token for
> proxy_user against hive/[email protected] at thrift://hive-metastore:9083
> DEBUG UserGroupInformation: PrivilegedAction as:spark/[email protected]
> (auth:KERBEROS)
> from:org.apache.spark.deploy.security.HiveDelegationTokenProvider.doAsRealUser(HiveDelegationTokenProvider.scala:130){code}
> This means that Spark made a call to fetch the delegation token from the Hive
> metastore and then added it to the list of credentials for the UGI. [This is
> the piece of
> code|https://github.com/apache/spark/blob/branch-2.4/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala#L129]
> that does that. I also verified in the metastore logs that the
> {{get_delegation_token()}} call was being made.
> Now when I run a simple query like {{create table test_table (id int)
> location "s3://some/prefix";}} I get hit with an AWS credentials error. I
> modified the hive metastore code and added this right before the file system
> in Hadoop is initialized
> ([org/apache/hadoop/hive/metastore/Warehouse.java|#L116]):
> {code:java}
> public static FileSystem getFs(Path f, Configuration conf) throws
> MetaException {
> try {
> UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
> LOG.info("UGI information: " + ugi);
> Collection<Token<? extends TokenIdentifier>> tokens =
> ugi.getCredentials().getAllTokens();
> for(Token token : tokens) {
> LOG.info(token);
> }
> } catch (IOException e) {
> e.printStackTrace();
> }
> ...
> {code}
> In the metastore logs, this does print the correct UGI information:
> {code:java}
> UGI information: proxy_user (auth:PROXY) via hive/[email protected]
> (auth:KERBEROS){code}
> but there are no tokens present in the UGI. Looks like [Spark
> code|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala#L101]
> adds it with the alias {{hive.server2.delegation.token}} but I don't see it
> in the UGI. This makes me suspect that somehow the UGI scope is isolated and
> not being shared between spark-sql and hive metastore. How do I go about
> solving this? Any help will be really appreciated!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]