[ 
https://issues.apache.org/jira/browse/HADOOP-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678881#action_12678881
 ] 

Kan Zhang commented on HADOOP-4343:
-----------------------------------

Here is the authentication design I plan to implement. 

For all Hadoop services except NN, we simply use Kerberos. For NN, we 
complement Kerberos with a second mechanism called 
[DIGEST-MD5|http://www.ietf.org/rfc/rfc2831.txt] (available from Java SASL 
library). A client can authenticate to NN in 2 ways. 
* *Kerberos only* For example, a user accessing HDFS using Hadoop fs commands 
may use this approach.
* *Kerberos + DIGEST-MD5*  In this case, Kerberos is used for the initial 
authentication and setting up a secure connection between a client and NN. 
After that, the client can obtain a secret key from the server over the secure 
connection. This secret key is known only to the client and NN, and can be used 
by the client to authenticate to NN on subsequent accesses. Authentication 
using the secret key is done using the DIGEST-MD5 protocol, which doesn't 
involve any third party, such as Kerberos KDC (key distribution center). The 
client can also delegate the secret key to others, so that they may use the key 
to authenticate to NN as the client. This is useful in the cases where a M/R 
job needs to access NN as the job owner. Hereinafter, we refer to the secret 
key as *delegation token*. The reasons for introducing delegation token (and 
associated DIGEST-MD5 mechanism) are as follows.
** *Performance* On a Map/Reduce cluster, there can be thousands of Tasks 
running at the same time. If they use Kerberos to authenticate to a NN, they 
need either a delegated TGT (ticket granting ticket) or a delegated service 
ticket. If using delegated TGT, the Kerberos KDC could become a bottleneck, 
since each task needs to get a Kerberos service ticket from the KDC using the 
delegated TGT. Using delegation tokens will save those network traffic to the 
KDC. Another option is to use a delegated service ticket. Delegated service 
tickets can be used in a similar fashion as delegation tokens, i.e., without 
the need to contact an online third party like the KDC. However, Java GSS-API 
doesn't support service ticket delegation. We may need to use a 3rd party 
(native) Kerberos library, which requires significantly more development 
efforts and makes code less portable.
** *Credential renewal* For Tasks to use Kerberos, the Task owner's Kerberos 
TGT or service ticket needs to be delegated and made available to the Tasks. 
Both TGT and service ticket can be renewed for long-running jobs (up to max 
lifetime set at initial issuing). However, during Kerberos renewal, a new TGT 
or service ticket will be issued, which needs to be distributed to all running 
Tasks. If using delegation tokens, the renewal mechanism can be designed in 
such a way that only the validity period of a token is extended on the NN, but 
the token itself stays the same. Hence, no new tokens need to be issued and 
pushed to running Tasks. Moreover, renewing Kerberos tickets has to be done 
before current validity period expires, which puts a timing constraint on the 
renewal operation. Our delegation tokens can be renewed (or revived) after 
current validity period expires (but within the max lifetime) by the designated 
renewer. Being able to renew an expired delegation token is not considered a 
big risk since (unlike Kerberos) only the designated renewer can renew a token. 
A stolen token can't be renewed by the attacker. 
** *Less damage when credential is compromised* A user's Kerberos TGT may be 
used to access services other than HDFS. If a delegated TGT is used and 
compromised, the damage is greater than using an HDFS-only credential 
(delegation token). On the other hand, using a delegated service ticket is 
equivalent to using a delegation token.


> Adding user and service-to-service authentication to Hadoop
> -----------------------------------------------------------
>
>                 Key: HADOOP-4343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4343
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>
> Currently, Hadoop services do not authenticate users or other services. As a 
> result, Hadoop is subject to the following security risks.
> 1. A user can access an HDFS or M/R cluster as any other user. This makes it 
> impossible to enforce access control in an uncooperative environment. For 
> example, file permission checking on HDFS can be easily circumvented.
> 2. An attacker can masquerade as Hadoop services. For example, user code 
> running on a M/R cluster can register itself as a new TaskTracker.
> This JIRA is intended to be a tracking JIRA, where we discuss requirements, 
> agree on a general approach and identify subtasks. Detailed design and 
> implementation are the subject of those subtasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to