The current best practice is to firewall off your cluster, configure a
SOCKS proxy/gateway, and only allow traffic to the cluster from the
gateway. Being able to SSH into the gateway provides authentication.

See 
http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
for a description of how we accomplished this.

- Aaron

On Thu, Jul 23, 2009 at 3:23 AM, Steve Loughran<[email protected]> wrote:
> Ted Dunning wrote:
>>
>> Last I heard, the API could be suborned in this scenario.  Real credential
>> based identity would be needed to provide more than this.
>>
>> The hack would involve a changed hadoop library that lies about identity.
>> This would not be difficult to do.
>>
>> On Wed, Jul 22, 2009 at 11:45 PM, Mathias Herberts <
>> [email protected]> wrote:
>>
>>> You can simply set up some bastion hosts which are trusted and from
>>> which jobs can be run.
>>>
>>> Then let users connect to these hosts using a secure mechanism such as
>>> SSH
>>> keys.
>>>
>>> You can then create users/groups on those bastion hosts and have
>>> permissions on your HDFS files that use those credentials.
>>>
>
> There's no wire security, nothing to stop me pushing in packets straight to
> a datanode, saying who I claim to be.
>
> Even if you lock down access to the cluster so that I don't have direct
> access to the nodes, if I can run an MR job in the cluster, I can gain full
> administrative rights, by virtue of the fact the cluster is running my Java
> code on one of its nodes, a node which must have direct access to the rest
> of the cluster.
>
> the details are left as an exercise for the reader.
>
>
>
>
>

Reply via email to