On Fri, May 28, 2010 at 12:06 PM, Michael Segel
<[email protected]> wrote:
>
> Hi,
>
> You can't do that.

Unfortunately, Mike is right.

> The problem is that Hadoop is going to pick up your external IP address 
> because that's what the machine name resolves to. Then your slave nodes are 
> on the internal route and you don't see them.
>
> Is it a bug? Maybe. More like a design defect.

Definitely in the design defect category. The host name handling /
binding code is... complicated and not ideal for these types of
situations.

> The work around is to forget about using the second nic card for hadoop/hbase 
> traffic. Or make the internal network match your machine name and its dns 
> information. Then use the second ip address to communicate with the outside 
> world.
>
> So if your machine name is foo, and your dominiant ip address is on eth(0), 
> you want foo.company.com to resolve to eth(0) and foo-ext.company.com
> to resolve to eth(1). Its backwards but it should work.
>
> IMHO, after looking at this issue, it really doesn't matter since the cloud 
> shouldn't be getting a lot of external traffic except on the name node/job 
> tracker nodes which could be multi-homed.

It might be useful in the case where you're streaming data off of HDFS
directly to clients rather than in the MR or HBase case. Data import /
export comes to mind. Remember that clients establish a direct
connection to data nodes so a multihomed NN is insufficient. In that
case, "external" doesn't necessarily mean a public (routable) IP, but
simply another network. We've seen use cases for this in some
installations. One example is a data aggregation or ingestion network
is separate from the Hadoop internal network and you'd like to get
data into HDFS.

-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

Reply via email to