[
https://issues.apache.org/jira/browse/MAPREDUCE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261753#comment-14261753
]
Allen Wittenauer commented on MAPREDUCE-4168:
---------------------------------------------
Why would one use multiple nics?
The easy and obvious reason is security. It's an extremely desirable config to
have compute nodes be 'outbound only'. If YARN is providing compute power to an
automated system, there is no reason for the client to talk to anything other
than the RM and maybe the proxy server. Input is fetched via some other system
and output is pushed as the last part of the pipeline.
On the same token, in some networks there is a separate admin network that is
used for operational processes. That network is trusted, the user facing one
is not.
Hadoop supports multiple file systems. There are many filesystem designs where
it's reasonable to configure another nic acting as a backhaul to the backup
infrastructure. The last thing you'd want going across that pipe is user
network traffic.
... and those are just the ones off the top of my head.
bq. Clients that would need RPC connectivity to compute nodes, would be within
cluster network.
This seems to be too narrow of a view of the potential operating environment.
In other words, who says that the multiple nics are there because Hadoop needs
them? What if Hadoop is going into a DC that is brown field or has other
custom needs? Of course, as pointed above, it's trivial to come up with a
realistic use case where clients only need RPC access to master nodes which
are, unfortunately, the key problem bits.
> Support multiple network interfaces
> -----------------------------------
>
> Key: MAPREDUCE-4168
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4168
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Tom White
>
> Umbrella jira to track the MapReduce side of HADOOP-8198.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)