Amandeep Khurana wrote:
Thanks for the feedback Steve.

My response on the points that you have mentioned are written inline below.

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Thu, Mar 19, 2009 at 4:31 AM, Steve Loughran <ste...@apache.org> wrote:

Amandeep Khurana wrote:

Apparently, the file attached was striped off. Here's the link for where
you
can get it:
http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf<http://www.soe.ucsc.edu/%7Eakhurana/Hadoop_Security.pdf>

Amandeep



This is a good paper with test data to go alongside the theory
Introduction
========
-I'd cite NFS as a good equivalent design, the same "we trust you to be who
you say you are" protocol, similar assumptions about the network ("only
trusted machines get on it")
-If EC2 does not meet these requirements, you could argue it's  fault of
EC2; there's no fundamental reason why it can't offer private VPNs for
clusters the way other infrastructure (VMWare) can
-the whoami call is done by the command line client; different clients
don't even have to do that. Mine doesn't.
-it is not the "superuser" in unix sense, "root", that runs jobs, it is
whichever user started hadoop on that node. It can still be a locked down
user with limited machine rights.


I'll look into the NFS security stuff in detail and then add it later.


The key point about NFS security is there was none, because the early eighties, the idea of a linux laptop getting on your wifi network was not conceivable, so you really could trust workstations. It was only with PC-NFS that the assumptions started to fail.


Where did EC2 come into picture?

Its an example of a place where Hadoop is deployed where the assumption that only trusted users have network access (and/or only fixed IP addresses can join the cluster) don't hold.


Yes, the whoami can be bypassed, thats why the whole thing around
authentication.

By superuser, I meant the user who starts the hadoop instance... Will make
it clearer in the writing.

OK




Attacks
====
Add
 -unauthorised nodes spoofing other IP addresses (via ARP attacks) and
becoming nodes in the cluster. You could acquire and then keep or destroy
data, or pretend to do work and return false values.  Or come up as a spoof
namenode datanode and disrupt all work.
-denial of service attacks: too many heartbeats, etc
-spoof clients running malicious code on the tasktrackers.


I havent looked these attacks. This paper is not focussing on that. This can
definitely be looked at and incorporated at a later stage. Lets go step by
step. (Debatable)

I was just broadening the list of attacks. Spoofing joining the cluster is something to fear.


Protocol
======
-SSL does need to deal with trust; unless you want to pay for every server
certificate (you may be able to share them), you'll
need to set up your own CA and issuing private certs -leaving you with the
problem of securiing distributing CA public keys and getting SSL private
keys out to nodes securely (and not anything on the net trying to use your
kickstart server to boot a VM with the same mac address as a trusted server
just to get at those keys)


SSL is a possible solution but the details arent the focus of this design.
Regarding the other keys, there is a format around which they are created
and you dont need a CA for that.



-I'll have to get somebody who understands security protocols to review the
paper. One area I'd flag as trouble is that on virtual machines, clock drift
can be choppy and non-linear. You also have to worry about clients not being
in the right time zone. It is good for everything to work off one clock (say
the namenode) rather than their own. Amazon's S3 authentication protocol has
this bug, as do the bits of WS-DM which take absolute times rather than
relative ones (presumably to make operations idempotent). A the very least,
the namenode needs an operation to return its current time, which callers
can then work off


The time issue is definitely a concern and has to be somehow cracked. The
namenode giving its time is a good idea. But the sync would still be
important. There is a way to sync the time across the cluster. I dont
remember it clearly, but I have it on my "little" cluster. I'll look that
up.


NTP is the normal protocol, everyone tries to use it. But asking the NN for its clock would avoid having to rely on everything being in sync at the OS level -and would let the client detect when its clock had drifted too far off for a conversation. One recurrent problem of mine is machines that are on NTP but whose time zones are wrong; they are perfectly accurate to the second but 8 hours out.

-steve

Reply via email to