[ 
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294730#comment-14294730
 ] 

Christopher Tubbs commented on ACCUMULO-3513:
---------------------------------------------

bq. I'm not sure how we can make any reliable security model if we operate 
under the assumption that YARN is insecure. We have to trust that the YARN task 
was correctly authenticated.

Right, we have to authenticate both YARN *and* the end user. Even if YARN 
doesn't work this way, and it uses some delegation token instead of any 
identifying information about itself, Accumulo's implementation requires a 
Kerberos token at the transport layer. You can't just omit a Kerberos token and 
replace it with a delegation token in Accumulo's implementation (nor do I think 
it'd be a good idea to try, because I do think we need to authenticate the 
middle-man, in this case YARN).

bq. Again. We have to assume YARN is doing the right thing.

No, we absolutely do not have to make any such assumption. We can validate that 
by only whitelisting approved, trusted intermediaries. This is no different 
than X.509 extensions that designate permitted uses on certificates. The fact 
that a certificate was signed by the same CA, does not automatically make it 
appropriate to use to sign executable code, or to encrypt email. The only thing 
is, Kerberos does not have any such mechanism built-in, like X.509 certificate 
extensions, so whitelist is the only option.

bq. The code running inside a YARN task is untrusted (unless you restrict job 
submission and vet the users externally – hit the users with a stick and tell 
them to behave). We should not be trusting this code to act as the user that it 
should.

That's just my point... you don't know what is going on inside the YARN system. 
For all you know, there is a job accessing the local disk or system memory, 
searching for other client's credentials, and using them to connect to 
Accumulo. Just because YARN tries to connect using some client's credentials, 
it doesn't mean it's a valid use (granted, that takes effort). You've got to 
actually lock down your YARN instance vet the infrastructure and the code it 
runs before you can be sure that the credentials a job in YARN uses to try to 
connect to Accumulo with are for a legitimate purpose. But, once this is done, 
the precise degree to which the additional security offered by the delegation 
token (due to expirable attributes, for instance) is debatable... but I concede 
that it is at least marginally better than without, so we can move past that 
point if you like. If it has the ability to expire, I'm in favor.

bq. The shared secret is acting in place of the kerberos credentials because 
there is no credentials available for use. ...

I'm not so sure that's true. There's no credentials that represent the end 
user, which are available to use, but the YARN process itself should have some 
Kerberos identity, shouldn't it? I've read that paper, but and the quoted 
portion, but I had assumed (perhaps incorrectly) that the YARN process would 
use its own Kerberos credentials to set up the transport layer, over which it 
sends the delegation token for additional validation and authorization. I 
assumed the wording about it using a delegation token in place of a Kerberos 
token was just shorthand for something a bit more complicated. Otherwise, what 
network protocol is it using that supports both Kerberos and a delegation 
token? Even if HDFS/YARN is using some custom protocol which supports both (or 
two RPC endpoints), Accumulo's SASL implementation certainly is not... it needs 
*some* Kerberos credentials to set up the transport layer, before we can send 
any delegation token or whatever across.

> Ensure MapReduce functionality with Kerberos enabled
> ----------------------------------------------------
>
>                 Key: ACCUMULO-3513
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Blocker
>             Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop 
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to 
> submit a job, then the notion of delegation tokens is used by for further 
> communication since the servers do not have access to the client's sensitive 
> information. A centralized service manages creation of a delegation token 
> which is a record which contains certain information (such as the submitting 
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to 
> manage delegation tokens to node managers to acquire and use to run jobs. 
> Hadoop and HBase both contain code which implements this general idea, but we 
> will need to apply them Accumulo and verify that it is M/R jobs still work on 
> a kerberized environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to