[
https://issues.apache.org/jira/browse/HADOOP-11226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288518#comment-14288518
]
Chris Nauroth commented on HADOOP-11226:
----------------------------------------
I think it's reasonable to favor low-latency handling on IPC connections. As I
understand it, this would have pushed the packet loss burden into shuffle
traffic or data transfer protocol traffic during the scenario Gopal originally
reported. That's probably a good trade-off, because there tends to be a higher
expectation of responsiveness from IPC for things like user commands sent to
the NameNode or ResourceManager, or even just heartbeats.
A potential concern is that Hadoop runs in a wide variety of network
architectures, which makes it difficult to test changes like this
comprehensively. I see this has been labeled "Infiniband", so that's one
scenario covered. I'd advise waiting a few more days to see if anyone else
wants to review.
Community members, if you think you're running Hadoop in a unique network
architecture, please help by reviewing this patch.
> ipc.Client has to use setTrafficClass() with IPTOS_LOWDELAY|IPTOS_RELIABILITY
> -----------------------------------------------------------------------------
>
> Key: HADOOP-11226
> URL: https://issues.apache.org/jira/browse/HADOOP-11226
> Project: Hadoop Common
> Issue Type: Bug
> Components: ipc
> Affects Versions: 2.6.0
> Reporter: Gopal V
> Assignee: Gopal V
> Labels: Infiniband
> Attachments: HADOOP-11226.1.patch, HADOOP-11226.2.patch
>
>
> During heavy shuffle, packet loss for IPC packets was observed from a machine.
> Avoid packet-loss and speed up transfer by using 0x14 QOS bits for the
> packets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)