hmar...@umbc.edu wrote:
Steve,

Security through obscurity is always a good practice from a development
standpoint and one of the reasons why tricking you out is an easy task.

:)

My most recent presentation on HDFS clusters is now online, notice how it
doesn't gloss over the security: http://www.slideshare.net/steve_l/hdfs-issues

Please, keep hiding relevant details from people in order to keep everyone
smiling.


HDFS is as secure as NFS: you are trusted to be who you say you are. Which means that you have to run it on a secured subnet -access restricted to trusted hosts and/or one two front end servers or accept that your dataset is readable and writeable by anyone on the network.

There is user identification going in; it is currently at the level where it will stop someone accidentally deleting the entire filesystem if they lack the rights. Which has been known to happen.

If the team looking after the cluster demand separate SSH keys/login for every machine then not only are they making their operations costs high, once you have got the HDFS cluster and MR engine live, it's moot. You can push out work to the JobTracker, which then runs it on the machines, under whatever userid the TaskTrackers are running on. Now, 0.20+ will run it under the identity of the user who claimed to be submitting the job, but without that, your MR Jobs get the access rights to the filesystem of the user that is running the TT, but it's fairly straightforward to create a modified hadoop client JAR that doesn't call whoami to get the userid, and instead spoofs to be anyone. Which means that even if you lock down the filesystem -no out of datacentre access-, if I can run my java code as MR jobs in your cluster, I can have unrestricted access to the filesystem by way of the task tracker server.

But Hal, if you are running Ant for your build I'm running my code on your machines anyway, so you had better be glad that I'm not malicious.

-Steve

Reply via email to