It is pretty easy to have a proxy of some kind accept files to be put into
HDFS.  Make sure that the proxy doesn't preferentially write to itself.  The
easiest way to avoid that is to either have the proxy outside the HDFS.


On 9/11/07 3:06 PM, "Stu Hood" <[EMAIL PROTECTED]> wrote:

> We would definitely limit the IP ranges that were allowed to connect via the
> external IP to prevent complete access: the clients in this case would be in
> other data centers with known addresses.
> 
> I'm less concerned with being able to submit jobs remotely as I am to be able
> to access the DFS remotely. The plan was to have other data centers act as
> Hadoop clients and push new files. Perhaps I should look for a solution that
> puts all of the HClients inside the firewall?
> 
> Thanks,
> Stu
> 
> 
> 
> -----Original Message-----
> From: Ted Dunning
> Sent: Tuesday, September 11, 2007 5:52pm
> To: hadoop-user@lucene.apache.org
> Subject: Re: Hadoop behind a Firewall
> 
> 
> 
> If the only purpose of the clients is to launch map-reduce jobs you may be
> able to get away with some DNS evil to limit the number of external IP's.
> You can use the diagnostic HTTP interfaces as well to see data with limited
> access.  Other than such severely limited operation, you will be hard
> pressed because the whole point of HDFS is that the client communicate
> directly with the datanode when reading or writing.
> 
> Wat is the rationale for this firewall arrangement?  Since HDFS has no
> permissions, any access is about the same as complete access.
> 
> 
> On 9/11/07 2:40 PM, "Stu Hood"  wrote:
> 
>> Hey gang,
>> 
>> We're getting ready to deploy our first cluster, and while deciding on the
>> node layout, we ran into an interesting question.
>> 
>> The cluster will be behind a firewall, and a few clients will be on the
>> outside. We'd like to minimize the number of external IPs we use, and provide
>> a single IP address with forwarded ports for each node (using iptables).
>> 
>> We've used this method before with simpler "client -> server" protocols, but
>> because of Hadoop's "client -> namenode -> client -> datanode" protocol, I'm
>> assuming this will not work by default.
>> 
>> Is it possible to configure the namenode to send clients a different external
>> IP/port for the datanodes than the one it uses when it communicates directly?
>> 
>> Thanks a lot!
>> 
>> Stu Hood
>> 
>> Webmail.us
>> 
>> "You manage your business. We'll manage your email."®
> 

Reply via email to