Re: Make Hadoop run more securely in Public Cloud environment

Xianqing Yu Thu, 13 Sep 2012 12:04:39 -0700

Hi Kingshuk,

Thank you for your interesting.

I think you make a very nice example. If Healthcare company push their datato public cloud, the byte-level access control can minimize the data everyparty can get (e.g. task process). So even one task process or TaskTrackeris hacked, the information loss can be minimized.

Another feature is also very help to this scenario. Currently all NameNodeand DataNodes are sharing the same key to generate Block Access Token. Ifthe hacker get the key by hacking any one of HDFS machine, she or hepotentially can read everything in the HDFS and impact is huge. So Ire-design that to make sure that, if hacker success to attack one machine,he or she can only get what is on this machine, not others in the cluster.

And also secure channel (encrypted channel) to transfer data can be anothersecurity bonus.


Thanks,

Xianqing

-----Original Message-----From: Kingshuk Chatterjee

Sent: Thursday, September 13, 2012 2:23 PM
To: 'Peng Ning'
Cc: [email protected]
Subject: RE: Make Hadoop run more securely in Public Cloud environment

Hi Xianqing -

I am a systems architect and a consultant for Healthcare industry, and thefirst impression I get from your email below is that the byte level securitycan be a very helpful feature in securing patient's health information(PHI), and assuring the healthcare service providers to take steps to pushtheir data to public cloud.


I will be happy to contribute in anyway, let me know.

Regards//K

Kingshuk Chatterjee
Director, Technology Consulting
--------------------------------------------------------------------------------
5155 Rosecrans Ave, Suite 250               http://www.calance.com
Hawthorne, CA 90250                                +1-(412 606 8582)

-----Original Message-----
From: Xianqing Yu [mailto:[email protected]]
Sent: Thursday, September 13, 2012 11:19 AM
To: [email protected]
Cc: Peng Ning
Subject: Make Hadoop run more securely in Public Cloud environment

Hi Hadoop community,

I am a Ph.D student in North Carolina State University. I am modifying theHadoop's code (which including most parts of Hadoop, e.g. JobTracker,TaskTracker, NameNode, DataNode) to achieve better security.

My major goal is that make Hadoop running more secure in the Cloudenvironment, especially for public Cloud environment. In order to achievethat, I redesign the currently security mechanism and achieve followingproprieties:

1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFSaccess control is based on user or block granularity, e.g. HDFS DelegationToken only check if the file can be accessed by certain user or not, BlockToken only proof which block or blocks can be accessed. I make Hadoop can dobyte-granularity access control, each access party, user or task process canonly access the bytes she or he least needed.

2. I assume that in the public Cloud environment, only Namenode, secondaryNamenode, JobTracker can be trusted. A large number of Datanode andTaskTracker may be compromised due to some of them may be running under lesssecure environment. So I re-design the secure mechanism to make the damagethe hacker can do to be minimized.

a. Re-design the Block Access Token to solve wildly shared-key problem ofHDFS. In original Block Access Token design, all HDFS (Namenode andDatanode) share one master key to generate Block Access Token, if oneDataNode is compromised by hacker, the hacker can get the key and generateany Block Access Token he or she want.

b. Re-design the HDFS Delegation Token to do fine-grain access control forTaskTracker and Map-Reduce Task process on HDFS.

In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentialsto access any files for MapReduce on HDFS. So they have the same privilegeas JobTracker to do read or write tokens, copy job file, etc.. However, ifone of them is compromised, every critical thing in MapReduce directory (jobfile, Delegation Token) is exposed to attacker. I solve the problem bymaking JobTracker to decide which TaskTracker can access which file inMapReduce Directory on HDFS.

For Task process, once it get HDFS Delegation Token, it can accesseverything belong to this job or user on HDFS. By my design, it can onlyaccess the bytes it needed from HDFS.

There are some other improvement in the security, such as TaskTracker cannot know some information like blockID from the Block Token (because it isencrypted by my way), and HDFS can set up secure channel to send data as aoption.

By those features, Hadoop can run much securely under uncertain environmentsuch as Public Cloud. I already start to test my prototype. I want to knowthat whether community is interesting about my work? Is that a value work tocontribute to production Hadoop?

I created JIRA for the discussion.https://issues.apache.org/jira/browse/HADOOP-8803#comment-13455025


Thanks,

Xianqing

Re: Make Hadoop run more securely in Public Cloud environment

Reply via email to