[
https://issues.apache.org/jira/browse/HADOOP-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678881#action_12678881
]
Kan Zhang commented on HADOOP-4343:
-----------------------------------
Here is the authentication design I plan to implement.
For all Hadoop services except NN, we simply use Kerberos. For NN, we
complement Kerberos with a second mechanism called
[DIGEST-MD5|http://www.ietf.org/rfc/rfc2831.txt] (available from Java SASL
library). A client can authenticate to NN in 2 ways.
* *Kerberos only* For example, a user accessing HDFS using Hadoop fs commands
may use this approach.
* *Kerberos + DIGEST-MD5* In this case, Kerberos is used for the initial
authentication and setting up a secure connection between a client and NN.
After that, the client can obtain a secret key from the server over the secure
connection. This secret key is known only to the client and NN, and can be used
by the client to authenticate to NN on subsequent accesses. Authentication
using the secret key is done using the DIGEST-MD5 protocol, which doesn't
involve any third party, such as Kerberos KDC (key distribution center). The
client can also delegate the secret key to others, so that they may use the key
to authenticate to NN as the client. This is useful in the cases where a M/R
job needs to access NN as the job owner. Hereinafter, we refer to the secret
key as *delegation token*. The reasons for introducing delegation token (and
associated DIGEST-MD5 mechanism) are as follows.
** *Performance* On a Map/Reduce cluster, there can be thousands of Tasks
running at the same time. If they use Kerberos to authenticate to a NN, they
need either a delegated TGT (ticket granting ticket) or a delegated service
ticket. If using delegated TGT, the Kerberos KDC could become a bottleneck,
since each task needs to get a Kerberos service ticket from the KDC using the
delegated TGT. Using delegation tokens will save those network traffic to the
KDC. Another option is to use a delegated service ticket. Delegated service
tickets can be used in a similar fashion as delegation tokens, i.e., without
the need to contact an online third party like the KDC. However, Java GSS-API
doesn't support service ticket delegation. We may need to use a 3rd party
(native) Kerberos library, which requires significantly more development
efforts and makes code less portable.
** *Credential renewal* For Tasks to use Kerberos, the Task owner's Kerberos
TGT or service ticket needs to be delegated and made available to the Tasks.
Both TGT and service ticket can be renewed for long-running jobs (up to max
lifetime set at initial issuing). However, during Kerberos renewal, a new TGT
or service ticket will be issued, which needs to be distributed to all running
Tasks. If using delegation tokens, the renewal mechanism can be designed in
such a way that only the validity period of a token is extended on the NN, but
the token itself stays the same. Hence, no new tokens need to be issued and
pushed to running Tasks. Moreover, renewing Kerberos tickets has to be done
before current validity period expires, which puts a timing constraint on the
renewal operation. Our delegation tokens can be renewed (or revived) after
current validity period expires (but within the max lifetime) by the designated
renewer. Being able to renew an expired delegation token is not considered a
big risk since (unlike Kerberos) only the designated renewer can renew a token.
A stolen token can't be renewed by the attacker.
** *Less damage when credential is compromised* A user's Kerberos TGT may be
used to access services other than HDFS. If a delegated TGT is used and
compromised, the damage is greater than using an HDFS-only credential
(delegation token). On the other hand, using a delegated service ticket is
equivalent to using a delegation token.
> Adding user and service-to-service authentication to Hadoop
> -----------------------------------------------------------
>
> Key: HADOOP-4343
> URL: https://issues.apache.org/jira/browse/HADOOP-4343
> Project: Hadoop Core
> Issue Type: New Feature
> Reporter: Kan Zhang
> Assignee: Kan Zhang
>
> Currently, Hadoop services do not authenticate users or other services. As a
> result, Hadoop is subject to the following security risks.
> 1. A user can access an HDFS or M/R cluster as any other user. This makes it
> impossible to enforce access control in an uncooperative environment. For
> example, file permission checking on HDFS can be easily circumvented.
> 2. An attacker can masquerade as Hadoop services. For example, user code
> running on a M/R cluster can register itself as a new TaskTracker.
> This JIRA is intended to be a tracking JIRA, where we discuss requirements,
> agree on a general approach and identify subtasks. Detailed design and
> implementation are the subject of those subtasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.