On Fri, May 28, 2010 at 12:06 PM, Michael Segel <[email protected]> wrote: > > Hi, > > You can't do that.
Unfortunately, Mike is right. > The problem is that Hadoop is going to pick up your external IP address > because that's what the machine name resolves to. Then your slave nodes are > on the internal route and you don't see them. > > Is it a bug? Maybe. More like a design defect. Definitely in the design defect category. The host name handling / binding code is... complicated and not ideal for these types of situations. > The work around is to forget about using the second nic card for hadoop/hbase > traffic. Or make the internal network match your machine name and its dns > information. Then use the second ip address to communicate with the outside > world. > > So if your machine name is foo, and your dominiant ip address is on eth(0), > you want foo.company.com to resolve to eth(0) and foo-ext.company.com > to resolve to eth(1). Its backwards but it should work. > > IMHO, after looking at this issue, it really doesn't matter since the cloud > shouldn't be getting a lot of external traffic except on the name node/job > tracker nodes which could be multi-homed. It might be useful in the case where you're streaming data off of HDFS directly to clients rather than in the MR or HBase case. Data import / export comes to mind. Remember that clients establish a direct connection to data nodes so a multihomed NN is insufficient. In that case, "external" doesn't necessarily mean a public (routable) IP, but simply another network. We've seen use cases for this in some installations. One example is a data aggregation or ingestion network is separate from the Hadoop internal network and you'd like to get data into HDFS. -- Eric Sammer phone: +1-917-287-2675 twitter: esammer data: www.cloudera.com
