[ 
https://issues.apache.org/jira/browse/HADOOP-14104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892979#comment-15892979
 ] 

Daryn Sharp commented on HADOOP-14104:
--------------------------------------

bq. ... how client handle namenode HA ...  we specify keyproviderservice in 
config file ...

The need for configs is the problem, so more configs is not the answer (aside: 
the NN HA token handling is an example of exactly what not to do).

bq. One thing I might recommend is that we don't query getServerDefaults after 
we get the KP initially. 

Enabling EZ on a cluster must not require a restart of all daemon and proxy 
services that communicate with said cluster.  It can't be cached forever.

––

I reviewed Rushabh's approach with him this morning.  The main goal should be a 
config-free token acquisition and selection.  How do we get there?

The first challenge is how does a client intelligently request a kms token, 
when needed, and from the right kms?   The NN is the authoritative and dynamic 
source for the correct kms, ala this patch..  Token acquisition should use the 
kp uri provided by the NN, and I'm not too worried about caching when a typical 
cluster has a few dozen app submits/sec (equaling token requests) vs 10s of 
thousand of NN ops/sec.  This is only a small part of the problem.

The second challenge is how does a client select the correct kms for a given 
NN?  The client could again ask the NN but you stumble into the morass of 
caching.  However as soon as the NN reports a different key provider than when 
a job launched, clients won't be able to find a token for the new kms - even 
when the old one is still legit.  Now jobs fail that should/could have 
completed.  It's very messy.  The simpler answer is a client should always use 
the key provider for a given NN as it existed when the token was acquired (ie. 
job submit).

So how do we implement a config-free mapping of NN to key provider?  When the 
hdfs and kms tokens are acquired we need a way to later associate them as a 
pair.  I think the cleanest/most-compatible way is leveraging the Credentials 
instead of the config.  We could inject a mapping of filesystem uri to kms uri 
via the secrets map.  So now when the client needs to talk to the kms it can 
check the map, else fallback to getServerDefaults.


> Client should always ask namenode for kms provider path.
> --------------------------------------------------------
>
>                 Key: HADOOP-14104
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14104
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: kms
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HADOOP-14104-trunk.patch, HADOOP-14104-trunk-v1.patch
>
>
> According to current implementation of kms provider in client conf, there can 
> only be one kms.
> In multi-cluster environment, if a client is reading encrypted data from 
> multiple clusters it will only get kms token for local cluster.
> Not sure whether the target version is correct or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to