[jira] [Commented] (HIVE-16913) Support per-session S3 credentials

Steve Loughran (JIRA) Thu, 29 Jun 2017 03:20:31 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068127#comment-16068127
 ]


Steve Loughran commented on HIVE-16913:
---------------------------------------

You are going to need a multi-tenant Hive service, such as LLAP.  Or start a 
new Hive Tez app within a YARN cluster, as a new user.

Workflow would be the same as passing HDFS delegation tokens around

# client starts query
# client enums all filesystems used in query
# For each FS, if they support delegation tokens, their DTs are requested and 
added to the list of tokens
# This list of tokens is serialized and sent with the query
# At the far end, these are unmarshalled and added to the UGI user entry for 
the caller (Hadoop RPC does this)
# When the service then does {{currentUser.doAs()}}, those DTs will be 
available. 
# When a new FS instance is looked up, it will be mapped to (user, URL), so no 
user shares a filesystem instance
# Hence the S3 session tokens will be avaiable for auth by a new S3A/AWS 
authenticator *only* for that user's FS instance
# And when the call is finished, if the filesystems for that user are released, 
they get cleaned up.

This is almost exactly what's done with HDFS access today, the big diff being 
the delegation token is actually forwarded to HDFS itself (same for HBase). 
Here I'm saying "we only need to get it to the other client"

> Support per-session S3 credentials
> ----------------------------------
>
>                 Key: HIVE-16913
>                 URL: https://issues.apache.org/jira/browse/HIVE-16913
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>
> Currently, the credentials needed to support Hive-on-S3 (or any other 
> cloud-storage) need to be to the hive-site.xml. Either using a hadoop 
> credential provider or by adding the keys in the hive-site.xml in plain text 
> (unsecure)
> This limits the usecase to using a single S3 key. If we configure per bucket 
> s3 keys like described [here | 
> http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Configurations_different_S3_buckets]
>  it exposes the access to all the buckets to all the hive users.
> It is possible that there are different sets of users who would not like to 
> share there buckets and still be able to process the data using Hive. 
> Enabling session level credentials will help solve such use-cases. For 
> example, currently this doesn't work
> {noformat}
> set fs.s3a.secret.key=my_secret_key;
> set fs.s3a.access.key=my_access.key;
> {noformat}
> Because metastore is unaware of the the keys. This doesn't work either
> {noformat}
> set fs.s3a.secret.key=my_secret_key;
> set fs.s3a.access.key=my_access.key;
> set metaconf:fs.s3a.secret.key=my_secret_key;
> set metaconf:fs.s3a.access.key=my_access_key;
> {noformat}
> This is because only a certain metastore configurations defined in 
> {{HiveConf.MetaVars}} are allowed to be set by the user. If we enable the 
> above approaches we could potentially allow multiple S3 credentials on a 
> per-session level basis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-16913) Support per-session S3 credentials

Reply via email to