The current best practice is to firewall off your cluster, configure a SOCKS proxy/gateway, and only allow traffic to the cluster from the gateway. Being able to SSH into the gateway provides authentication.
See http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/ for a description of how we accomplished this. - Aaron On Thu, Jul 23, 2009 at 3:23 AM, Steve Loughran<[email protected]> wrote: > Ted Dunning wrote: >> >> Last I heard, the API could be suborned in this scenario. Real credential >> based identity would be needed to provide more than this. >> >> The hack would involve a changed hadoop library that lies about identity. >> This would not be difficult to do. >> >> On Wed, Jul 22, 2009 at 11:45 PM, Mathias Herberts < >> [email protected]> wrote: >> >>> You can simply set up some bastion hosts which are trusted and from >>> which jobs can be run. >>> >>> Then let users connect to these hosts using a secure mechanism such as >>> SSH >>> keys. >>> >>> You can then create users/groups on those bastion hosts and have >>> permissions on your HDFS files that use those credentials. >>> > > There's no wire security, nothing to stop me pushing in packets straight to > a datanode, saying who I claim to be. > > Even if you lock down access to the cluster so that I don't have direct > access to the nodes, if I can run an MR job in the cluster, I can gain full > administrative rights, by virtue of the fact the cluster is running my Java > code on one of its nodes, a node which must have direct access to the rest > of the cluster. > > the details are left as an exercise for the reader. > > > > >
