While you can proxy puts/gets to HDFS, this can dramatically decrease your bandwidth. The hadoop dfs client is pretty good about writing to/reading from multiple HDFS nodes simultaneously; a proxy makes this impossible.
Of course, depending on your cluster size, network connection, and data size, you may not care. -Michael On 9/11/07 3:15 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > > It is pretty easy to have a proxy of some kind accept files to be put into > HDFS. Make sure that the proxy doesn't preferentially write to itself. The > easiest way to avoid that is to either have the proxy outside the HDFS. > > > On 9/11/07 3:06 PM, "Stu Hood" <[EMAIL PROTECTED]> wrote: > >> We would definitely limit the IP ranges that were allowed to connect via the >> external IP to prevent complete access: the clients in this case would be in >> other data centers with known addresses. >> >> I'm less concerned with being able to submit jobs remotely as I am to be able >> to access the DFS remotely. The plan was to have other data centers act as >> Hadoop clients and push new files. Perhaps I should look for a solution that >> puts all of the HClients inside the firewall? >> >> Thanks, >> Stu >> >> >> >> -----Original Message----- >> From: Ted Dunning >> Sent: Tuesday, September 11, 2007 5:52pm >> To: hadoop-user@lucene.apache.org >> Subject: Re: Hadoop behind a Firewall >> >> >> >> If the only purpose of the clients is to launch map-reduce jobs you may be >> able to get away with some DNS evil to limit the number of external IP's. >> You can use the diagnostic HTTP interfaces as well to see data with limited >> access. Other than such severely limited operation, you will be hard >> pressed because the whole point of HDFS is that the client communicate >> directly with the datanode when reading or writing. >> >> Wat is the rationale for this firewall arrangement? Since HDFS has no >> permissions, any access is about the same as complete access. >> >> >> On 9/11/07 2:40 PM, "Stu Hood" wrote: >> >>> Hey gang, >>> >>> We're getting ready to deploy our first cluster, and while deciding on the >>> node layout, we ran into an interesting question. >>> >>> The cluster will be behind a firewall, and a few clients will be on the >>> outside. We'd like to minimize the number of external IPs we use, and >>> provide >>> a single IP address with forwarded ports for each node (using iptables). >>> >>> We've used this method before with simpler "client -> server" protocols, but >>> because of Hadoop's "client -> namenode -> client -> datanode" protocol, I'm >>> assuming this will not work by default. >>> >>> Is it possible to configure the namenode to send clients a different >>> external >>> IP/port for the datanodes than the one it uses when it communicates >>> directly? >>> >>> Thanks a lot! >>> >>> Stu Hood >>> >>> Webmail.us >>> >>> "You manage your business. We'll manage your email."® >> >