If the only purpose of the clients is to launch map-reduce jobs you may be
able to get away with some DNS evil to limit the number of external IP's.
You can use the diagnostic HTTP interfaces as well to see data with limited
access.  Other than such severely limited operation, you will be hard
pressed because the whole point of HDFS is that the client communicate
directly with the datanode when reading or writing.

Wat is the rationale for this firewall arrangement?  Since HDFS has no
permissions, any access is about the same as complete access.


On 9/11/07 2:40 PM, "Stu Hood" <[EMAIL PROTECTED]> wrote:

> Hey gang,
> 
> We're getting ready to deploy our first cluster, and while deciding on the
> node layout, we ran into an interesting question.
> 
> The cluster will be behind a firewall, and a few clients will be on the
> outside. We'd like to minimize the number of external IPs we use, and provide
> a single IP address with forwarded ports for each node (using iptables).
> 
> We've used this method before with simpler "client -> server" protocols, but
> because of Hadoop's "client -> namenode -> client -> datanode" protocol, I'm
> assuming this will not work by default.
> 
> Is it possible to configure the namenode to send clients a different external
> IP/port for the datanodes than the one it uses when it communicates directly?
> 
> Thanks a lot!
> 
> Stu Hood
> 
> Webmail.us
> 
> "You manage your business. We'll manage your email."®

Reply via email to