[ 
https://issues.apache.org/jira/browse/HADOOP-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462043#comment-13462043
 ] 

Xianqing Yu commented on HADOOP-8803:
-------------------------------------

Hi Luke,

Even block id is randomly generated, attacker can still get this information by 
other way instead of guessing. One observation is that, although SASL layer can 
create confidential channel (which is encrypted channel), BlockReader does not 
use that layer. Instead, BlockReader would send Block Token directly to network 
without any protection. The attacker just use the compromised machine to dump 
network packet it can observe and fetch information, like other datanode 
addresses and block Ids from BlockTokenIdentifier. From the other side, in 
order to detect those brutal force attacker would need monitors on Namenode and 
each DataNode, they may need to exchange information to decide which datanode 
is bad guy, and the protocol is not very simply.

I do concern about the delegation tokens stored on that node. First I restrict 
the content tasktracker can access on Map-Reduce directory on HDFS (enforeced 
by JobTracker). So tasktracker can only download a special delegation token to 
the local node. This delegation Token would only give that machine privilege to 
access defined range of input file. So if that node is compromised, attacker 
only can get limited content of HDFS depending on which task is running on that 
node.

Uniformly configured cluster would weak my proposal. But it is a implementation 
issue and it depends on how user implement Hadoop. If someone manage a huge 
cluster, I think generally, when the cluster need update, manger would not 
shutdown the whole cluster, instead, she or he would update machines one by 
one. And Hadoop can be running on different OS, e.g. different versions of 
Linux, Windows, we can not assure that every machine in the cluster are the 
same (we should make Hadoop flexible, right). For example,in future, some users 
may want combine two clusters to increase scale, their OS can be different.

Thanks for your comments for test. I will keep that in mind.

About trade-off on security vs performance, I think it really depends on what 
Hadoop user want. I want my design to be really flexible(if users want to 
better security that they can use that fairly easy, if not, they can just 
disable that in configuration file. It would be like what they did for enabling 
BlockAccessToken.) I think market is always dynamic, it is hard to achieve 
perfect.

The improvment on OS security (e.g. POSIX, ACL) can be great and provide 
benefit for all Apps running on OS. I think that we try to make Hadoop more 
secure from different aspects. I feel that your goal is that try to make hadoop 
fully secured, no bad guys can get in. My goal is that how to reduce the damage 
if bad guys get in.

                
> Make Hadoop running more secure public cloud envrionment
> --------------------------------------------------------
>
>                 Key: HADOOP-8803
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8803
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, ipc, security
>    Affects Versions: 0.20.204.0
>            Reporter: Xianqing Yu
>              Labels: hadoop
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> I am a Ph.D student in North Carolina State University. I am modifying the 
> Hadoop's code (which including most parts of Hadoop, e.g. JobTracker, 
> TaskTracker, NameNode, DataNode) to achieve better security.
>  
> My major goal is that make Hadoop running more secure in the Cloud 
> environment, especially for public Cloud environment. In order to achieve 
> that, I redesign the currently security mechanism and achieve following 
> proprieties:
> 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS 
> access control is based on user or block granularity, e.g. HDFS Delegation 
> Token only check if the file can be accessed by certain user or not, Block 
> Token only proof which block or blocks can be accessed. I make Hadoop can do 
> byte-granularity access control, each access party, user or task process can 
> only access the bytes she or he least needed.
> 2. I assume that in the public Cloud environment, only Namenode, secondary 
> Namenode, JobTracker can be trusted. A large number of Datanode and 
> TaskTracker may be compromised due to some of them may be running under less 
> secure environment. So I re-design the secure mechanism to make the damage 
> the hacker can do to be minimized.
>  
> a. Re-design the Block Access Token to solve wildly shared-key problem of 
> HDFS. In original Block Access Token design, all HDFS (Namenode and Datanode) 
> share one master key to generate Block Access Token, if one DataNode is 
> compromised by hacker, the hacker can get the key and generate any  Block 
> Access Token he or she want.
>  
> b. Re-design the HDFS Delegation Token to do fine-grain access control for 
> TaskTracker and Map-Reduce Task process on HDFS. 
>  
> In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials 
> to access any files for MapReduce on HDFS. So they have the same privilege as 
> JobTracker to do read or write tokens, copy job file, etc.. However, if one 
> of them is compromised, every critical thing in MapReduce directory (job 
> file, Delegation Token) is exposed to attacker. I solve the problem by making 
> JobTracker to decide which TaskTracker can access which file in MapReduce 
> Directory on HDFS.
>  
> For Task process, once it get HDFS Delegation Token, it can access everything 
> belong to this job or user on HDFS. By my design, it can only access the 
> bytes it needed from HDFS.
>  
> There are some other improvement in the security, such as TaskTracker can not 
> know some information like blockID from the Block Token (because it is 
> encrypted by my way), and HDFS can set up secure channel to send data as a 
> option.
>  
> By those features, Hadoop can run much securely under uncertain environment 
> such as Public Cloud. I already start to test my prototype. I want to know 
> that whether community is interesting about my work? Is that a value work to 
> contribute to production Hadoop?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to