[
https://issues.apache.org/jira/browse/TEZ-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884986#comment-15884986
]
Rajesh Balamohan commented on TEZ-3633:
---------------------------------------
Thanks for sharing the patch in MAPREDUCE-6850 [~jeagles]. I remember doing
some benchmarks based on initial patches with MAPREDUCE-5787, so was checking
the earlier details.
In MAPREDUCE-5787, Keepalive parameter checks were there till
https://issues.apache.org/jira/secure/attachment/12634984/MAPREDUCE-5787-2.4.0-v3.patch
as follows.
{noformat}
if (!keepAlive && !keepAliveParam) {
lastMap.addListener(ChannelFutureListener.CLOSE);
}
{noformat}
However, during refactoring it got missed out in subsequent patches. That
caused this problem. It was relying on client to close the connection. I.e it
was the responsibility of the client (JDK's internal http client) to terminate
the connection after keep-alive timeout. Current patch proposed in this JIRA
addresses that scenario as well, where in it would automatically close the
connection if timeout exceeds the threshold provided in server side.
> Implement keep-alive timeout in tez shuffle handler
> ---------------------------------------------------
>
> Key: TEZ-3633
> URL: https://issues.apache.org/jira/browse/TEZ-3633
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3633.1.patch, with_hadoop_2.7.3.png
>
>
> MAPREDUCE-5787 which added keep-alive to mapreduce shuffle handler was not
> fully functional as despite advertising keep-alive option and adding the
> header to the response, all connections were closed immediately after write.
> This reduced the performance of certain fetches as now time is spent
> requesting a second get to the same serve, only for that server to reset the
> connection forcing the client to reestablish the connection on another port.
> The details of this is hidden behind HttpURLConnection and doesn't show in
> any log file at default logging level. However TCP sniffing does show errant
> behavior.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)