[
https://issues.apache.org/jira/browse/HADOOP-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693563#comment-13693563
]
Sanjay Radia commented on HADOOP-9671:
--------------------------------------
Here is an initial draft of hadoop security usage scenarios, threat model and
problems that we would like to address.
*Hadoop Deployment Usage Scenarios*
The use cases below have two variations: with and without perimeter security
(such as Knox).
* U1 Hadoop insecure deployment (ie using UGI based “authentication”)
* U2 Hadoop deployment in Active Directory (Kerberos,LDAP) authentication
* U3 Hadoop deployment with Kerberos authentication
* U4 Hadoop deployment in LDAP only shop
* U5 Hadoop deployment in public Cloud (e.g. AWS, Azure, Rackspace)
* U6 Multiple Hadoop clusters in a single organization each with different
authentication requirements and potentially different IdPs for each.
*Security Threat Model for Hadoop*
(This list is an extension of the list published in
http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf
# An unauthorized client may access an HDFS file via the RPC or via HTTP
protocols.
# A unauthorized client may read/write a data block of a file at a DataNode via
the pipeline streaming data-transfer protocol
# A unauthorized user may submit a job to a queue or delete or change priority
of the job.
# A unauthorized client may access intermediate data of Map job via its task
trackers HTTP shuffle protocol.
# An executing task may use the host operating system interfaces to access
other tasks, access local data which include intermediate Map output or the
local storage of the DataNode that runs on the same physical node.
# A task may masquerade as a Hadoop service component such as a DataNode,
NameNode, job tracker, task tracker etc.
# A user may submit a workflow to Oozie as another user.
# A service may attempt to impersonate a user by using the client-presented
service access token
# A service may attempt to impersonate another service by using the
service-presented service access token (when a service is acting as a client of
another)
# A user may attempt to register as a service through service registration
endpoints (is this the same as 6?
*Hadoop Security Problems*
# Perimeter security solution - Knox addresses this
# Remove the need to create Unix accounts on each compute node - (note Unix
accounts are merely for isolation and not for authentication.) Linux containers
have the potential to fix this.
# Remove the need for root startup for Datanodes (HDFS-2856)
# Server authentication setup is painful - i.e. installing Keytabs for each
server. Simpler solution for Server-server mutual authentication (e.g. NN-DN)
and client-server mutual authentication.
# Authentication for customers with only LDAP (Both SSO jiras. HADOOP-9392 and
HADOOP-9533, are addressing these )
# Hadoop authentication should include group membership so that group
membership checking is not needed later. Note this critical for Cloud deplyment
where Security for public cloud deployment it is not practical to call back
from Cloud to the customer’s environment to get group membership. (Both SSO
jiras. HADOOP-9392 and HADOOP-9533, are addressing these ). Related to problem
12.
# Remove the shared secret between NN and DN (potentially extensions to the SSO
jiras)
# Remove the need for NN and JT delegation tokens (potentially extensions to
the SSO jiras)
# Encryption on communication pipes - verify configurations and test
# Encryption on data. One solution is to use OS level encryption- someone needs
to verify and test this.
# Add ACLs to HDFS
# Change Hadoop tokens to include group membership - see the Azure use case U4
above. Hadoop token need to support arbitrary attributes for ABAC.
# Implementation improvements and bugs
** Change Hadoop security impl so that UGI (ie non-secure hadoop deployment)
uses delegation tokens and block access tokens. (HADOOP-8779)
** Change the implementation of Hadoop rpc security to make the authentication
pluggable - note that architecturally Hadoop rpc authentication is pluggable
but the code has UGI and Kerberos too burnt in.
# Provide the ability to identify poorly or maliciously behaving applications -
independently from applications from the same user that may be behaving
properly. Note this is not a security issue per-say but we lack a
applicaiton/job identity that could be used to throttle a misbehaving
application. The hadoop job/hdfs delegation token could be used for that
purpose - is this reasonable use for it?
> Improve Hadoop security - master jira
> -------------------------------------
>
> Key: HADOOP-9671
> URL: https://issues.apache.org/jira/browse/HADOOP-9671
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Sanjay Radia
> Assignee: Sanjay Radia
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira