[
https://issues.apache.org/jira/browse/TEZ-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877526#comment-15877526
]
Rajesh Balamohan commented on TEZ-3633:
---------------------------------------
I enabled keep-alive in tez and mapreduce shuffle handler. (i.e
tez.runtime.shuffle.keep-alive.enabled=true and
mapreduce.shuffle.connection-keep-alive.enable=true)
To verify keep-alive connections are working without connection breaks, I
enabled wire logging in HttpUrlConnection as follows
Created the following file {{http.logger.properties}} and placed in {{/tmp/}}
folder in test cluster
{noformat}
handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level = ALL
java.util.logging.ConsoleHandler.formatter =
sun.net.www.protocol.http.HttpLogFormatter
sun.net.www.protocol.http.HttpURLConnection.level = ALL
sun.net.www.protocol.http.level=ALL
{noformat}
Tried out query-17 from TPC-DS with hive/tez which is relatively shuffle
intensive in hive-1.
{noformat}
hive --hiveconf
tez.task.launch.cmd-opts="-Djava.util.logging.config.file=/tmp/http.logger.properties"
{noformat}
>From the logs, I see logs of {{KeepAlive stream used}} which does indicate
>that keepAlive connections being used from sun's internal implementation.
>Hadoop version used is 2.7.3 from HDP.
{noformat}
FINE: sun.net.www.MessageHeader@4f81a00c8 pairs: {GET
/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true
HTTP/1.1: null}{UrlHash: rjk3o9nG2khcu5d+QGw1Gk1nhBY=}{name:
mapreduce}{version: 1.0.0}{User-Agent: Java/1.8.0_121}{Host:
ip-10-0-1-66.ec2.internal:13562}{Accept: text/html, image/gif, image/jpeg, *;
q=.2, */*; q=.2}{Connection: keep-alive}
Feb 22, 2017 5:16:37 AM sun.net.www.http.HttpClient logFinest
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true
Feb 22, 2017 5:16:37 AM sun.net.www.protocol.http.HttpURLConnection
getInputStream0
FINE: sun.net.www.MessageHeader@4146578c7 pairs: {null: HTTP/1.1 200
OK}{ReplyHash: wox3/ak3+1T3PwXc8eCk9C4wtDs=}{name: mapreduce}{version:
1.0.0}{Content-Length: 1156}{Connection: Keep-Alive}{Keep-Alive: timeout=5}
Feb 22, 2017 5:16:38 AM sun.net.www.protocol.http.HttpURLConnection
plainConnect0
...
FINEST: ProxySelector Request for
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000002_0_10006,attempt_1487556355205_0099_1_06_000003_0_10007,attempt_1487556355205_0099_1_06_000004_0_10006,attempt_1487556355205_0099_1_06_000001_0_10005,attempt_1487556355205_0099_1_06_000000_0_10005,attempt_1487556355205_0099_1_06_000007_0_10003,attempt_1487556355205_0099_1_06_000005_0_10002,attempt_1487556355205_0099_1_06_000006_0_10002,attempt_1487556355205_0099_1_06_000027_0_10023,attempt_1487556355205_0099_1_06_000033_0_10023,attempt_1487556355205_0099_1_06_000039_0_10026,attempt_1487556355205_0099_1_06_000044_0_10026,attempt_1487556355205_0099_1_06_000049_0_10026,attempt_1487556355205_0099_1_06_000038_0_10017,attempt_1487556355205_0099_1_06_000046_0_10017,attempt_1487556355205_0099_1_06_000048_0_10026&keepAlive=true
...
FINEST: KeepAlive stream retrieved from the cache,
sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true)
Feb 22, 2017 5:16:38 AM sun.net.www.protocol.http.HttpURLConnection
plainConnect0
........ printing more KeepAlive log lines alone from here onwards for
reference..
......
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_00_000000_0_10003&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000003_0_10007,attempt_1487556355205_0099_1_06_000004_0_10006,attempt_1487556355205_0099_1_06_000001_0_10005,attempt_1487556355205_0099_1_06_000000_0_10005&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000007_0_10003,attempt_1487556355205_0099_1_06_000005_0_10002&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000001_0_10010&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000000_0_10010&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000002_0_10006&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000006_0_10002,attempt_1487556355205_0099_1_06_000027_0_10023,attempt_1487556355205_0099_1_06_000033_0_10023,attempt_1487556355205_0099_1_06_000039_0_10026&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_03_000000_0_10003&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache,
sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000001_0_10010&keepAlive=true)
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000044_0_10026&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache,
sun.net.www.http.HttpClient(http://ip-10-0-1-228.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000021_0_10007,attempt_1487556355205_0099_1_04_000034_0_10007,attempt_1487556355205_0099_1_04_000037_0_10006,attempt_1487556355205_0099_1_04_000027_0_10006,attempt_1487556355205_0099_1_04_000033_0_10006,attempt_1487556355205_0099_1_04_000036_0_10007,attempt_1487556355205_0099_1_04_000031_0_10006,attempt_1487556355205_0099_1_04_000040_0_10006,attempt_1487556355205_0099_1_04_000030_0_10006&keepAlive=true)
FINEST: KeepAlive stream used:
http://ip-10-0-1-228.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000034_0_10011&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_02_000000_0_10003&keepAlive=true
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000043_0_10013,attempt_1487556355205_0099_1_04_000046_0_10022,attempt_1487556355205_0099_1_04_000049_0_10022,attempt_1487556355205_0099_1_04_000047_0_10013,attempt_1487556355205_0099_1_04_000048_0_10021&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache,
sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_04_000000_0_10010&keepAlive=true)
FINEST: KeepAlive stream used:
http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000049_0_10026&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache,
sun.net.www.http.HttpClient(http://ip-10-0-1-66.ec2.internal:13562/mapOutput?job=job_1487556355205_0099&reduce=0&map=attempt_1487556355205_0099_1_06_000044_0_10026&keepAlive=true)
{noformat}
> Implement keep-alive timeout in tez shuffle handler
> ---------------------------------------------------
>
> Key: TEZ-3633
> URL: https://issues.apache.org/jira/browse/TEZ-3633
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3633.1.patch
>
>
> MAPREDUCE-5787 which added keep-alive to mapreduce shuffle handler was not
> fully functional as despite advertising keep-alive option and adding the
> header to the response, all connections were closed immediately after write.
> This reduced the performance of certain fetches as now time is spent
> requesting a second get to the same serve, only for that server to reset the
> connection forcing the client to reestablish the connection on another port.
> The details of this is hidden behind HttpURLConnection and doesn't show in
> any log file at default logging level. However TCP sniffing does show errant
> behavior.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)