[
https://issues.apache.org/jira/browse/HADOOP-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329271#comment-14329271
]
Colin Patrick McCabe commented on HADOOP-11574:
-----------------------------------------------
Thanks again, Steve, great to see some activity on this.
I think from a pragmatic point of view, we might want to limit this JIRA to
just better network error diagnostics. Expanding the scope to cover
HADOOP-8198 might make it harder to complete. (Of course, if you've got the
bandwidth to implement a full multi-NIC solution for Hadoop, that would be
great.) But it really seems like HADOOP-8198 is a big enough JIRA that it
should have its own set of subtasks, rather than being a subtask of this JIRA.
Another important thing to point out here is that we have a lot of people using
multi-NIC in Hadoop via interface bonding. Basically you can make two hardware
ethernet cards (or onboard ports, etc) look like one by loading the Linux
ethernet bonding driver. And then no Java code changes are needed. Of course
this doesn't cover all the multi-NIC cases, but it does help explain why
multi-NIC hasn't been much of a pain point for us (and hasn't been completed).
> Uber-JIRA: improve Hadoop network resilience & diagnostics
> ----------------------------------------------------------
>
> Key: HADOOP-11574
> URL: https://issues.apache.org/jira/browse/HADOOP-11574
> Project: Hadoop Common
> Issue Type: Task
> Components: net
> Affects Versions: 2.6.0
> Reporter: Steve Loughran
>
> Improve Hadoop's resilience to bad network conditions/problems, including
> * improving recognition of problem states
> * improving diagnostics
> * better handling of IPv6 addresses, even if the protocol is unsupported.
> * better behaviour client-side when there are connectivity problems. (i.e
> while some errors you can spin on, DNS failures are not on the list)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)