[
https://issues.apache.org/jira/browse/ACCUMULO-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294730#comment-14294730
]
Christopher Tubbs commented on ACCUMULO-3513:
---------------------------------------------
bq. I'm not sure how we can make any reliable security model if we operate
under the assumption that YARN is insecure. We have to trust that the YARN task
was correctly authenticated.
Right, we have to authenticate both YARN *and* the end user. Even if YARN
doesn't work this way, and it uses some delegation token instead of any
identifying information about itself, Accumulo's implementation requires a
Kerberos token at the transport layer. You can't just omit a Kerberos token and
replace it with a delegation token in Accumulo's implementation (nor do I think
it'd be a good idea to try, because I do think we need to authenticate the
middle-man, in this case YARN).
bq. Again. We have to assume YARN is doing the right thing.
No, we absolutely do not have to make any such assumption. We can validate that
by only whitelisting approved, trusted intermediaries. This is no different
than X.509 extensions that designate permitted uses on certificates. The fact
that a certificate was signed by the same CA, does not automatically make it
appropriate to use to sign executable code, or to encrypt email. The only thing
is, Kerberos does not have any such mechanism built-in, like X.509 certificate
extensions, so whitelist is the only option.
bq. The code running inside a YARN task is untrusted (unless you restrict job
submission and vet the users externally – hit the users with a stick and tell
them to behave). We should not be trusting this code to act as the user that it
should.
That's just my point... you don't know what is going on inside the YARN system.
For all you know, there is a job accessing the local disk or system memory,
searching for other client's credentials, and using them to connect to
Accumulo. Just because YARN tries to connect using some client's credentials,
it doesn't mean it's a valid use (granted, that takes effort). You've got to
actually lock down your YARN instance vet the infrastructure and the code it
runs before you can be sure that the credentials a job in YARN uses to try to
connect to Accumulo with are for a legitimate purpose. But, once this is done,
the precise degree to which the additional security offered by the delegation
token (due to expirable attributes, for instance) is debatable... but I concede
that it is at least marginally better than without, so we can move past that
point if you like. If it has the ability to expire, I'm in favor.
bq. The shared secret is acting in place of the kerberos credentials because
there is no credentials available for use. ...
I'm not so sure that's true. There's no credentials that represent the end
user, which are available to use, but the YARN process itself should have some
Kerberos identity, shouldn't it? I've read that paper, but and the quoted
portion, but I had assumed (perhaps incorrectly) that the YARN process would
use its own Kerberos credentials to set up the transport layer, over which it
sends the delegation token for additional validation and authorization. I
assumed the wording about it using a delegation token in place of a Kerberos
token was just shorthand for something a bit more complicated. Otherwise, what
network protocol is it using that supports both Kerberos and a delegation
token? Even if HDFS/YARN is using some custom protocol which supports both (or
two RPC endpoints), Accumulo's SASL implementation certainly is not... it needs
*some* Kerberos credentials to set up the transport layer, before we can send
any delegation token or whatever across.
> Ensure MapReduce functionality with Kerberos enabled
> ----------------------------------------------------
>
> Key: ACCUMULO-3513
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3513
> Project: Accumulo
> Issue Type: Bug
> Components: client
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Blocker
> Fix For: 1.7.0
>
>
> I talked to [~devaraj] today about MapReduce support running on secure Hadoop
> to help get a picture about what extra might be needed to make this work.
> Generally, in Hadoop and HBase, the client must have valid credentials to
> submit a job, then the notion of delegation tokens is used by for further
> communication since the servers do not have access to the client's sensitive
> information. A centralized service manages creation of a delegation token
> which is a record which contains certain information (such as the submitting
> user name) necessary to securely identify the holder of the delegation token.
> The general idea is that we would need to build support into the master to
> manage delegation tokens to node managers to acquire and use to run jobs.
> Hadoop and HBase both contain code which implements this general idea, but we
> will need to apply them Accumulo and verify that it is M/R jobs still work on
> a kerberized environment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)