[ 
https://issues.apache.org/jira/browse/TEZ-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877721#comment-15877721
 ] 

Jonathan Eagles edited comment on TEZ-3633 at 2/22/17 4:17 PM:
---------------------------------------------------------------

[~rajesh.balamohan], I used the above logging technique to compare 
HttpURLConnection to what is happening at the network level. Below is a 
HttpURLConnection log and below it is a wireshark capture. You can see that the 
HttpURLLogging suggests that keep alive connections are being used but in 
reality the remote side issues a reset (RST) FIN and FIN, ACK were removed for 
brevity. You can see that when connection is attempted to be reused, the server 
resets the connection and the fetcher thread reattempts the request on a new 
connection. You can see the reattempts are new connections since they happen on 
a different port with the exactly the same request. The job is a 14:1:1 
OrderedWordCount job. My guess is that HttpURLConnection sees the keep-alive 
response header and is re-connecting silently behind the scenes. Do you have a 
way to get a network sniff of the above job to see if you are seeing the same 
behavior?

{noformat:title=HttpURLConnection logging}
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000004_0_10003,attempt_1487743755036_0018_1_00_000005_0_10003&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000004_0_10003,attempt_1487743755036_0018_1_00_000005_0_10003&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000010_0_10005&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000010_0_10005&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000009_0_10006&keepAlive=true
{noformat}

{noformat:title=Wireshark capture}
No      Time            srcaddr          srcport        dstaddr dstport 
protocol        length  info
10665   19.595725       127.0.0.1       49670   127.0.0.1       13563   HTTP    
446     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000004_0_10003,attempt_1487743755036_0018_1_00_000005_0_10003&keepAlive=true
 HTTP/1.1 
11726   19.697864       127.0.0.1       49670   127.0.0.1       13563   HTTP    
587     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true
 HTTP/1.1 
11727   19.697939       127.0.0.1       13563   127.0.0.1       49670   TCP     
44      13563 → 49670 [RST] Seq=17506303 Win=0 Len=0
11736   19.700396       127.0.0.1       49671   127.0.0.1       13563   HTTP    
587     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true
 HTTP/1.1 
13758   19.886892       127.0.0.1       13563   127.0.0.1       49671   TCP     
56      13563 → 49671 [FIN, ACK] Seq=44667349 Ack=532 Win=407744 Len=0 
TSval=881231323 TSecr=881231323
15865   21.339273       127.0.0.1       49671   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true
 HTTP/1.1 
15866   21.339350       127.0.0.1       13563   127.0.0.1       49671   TCP     
44      13563 → 49671 [RST] Seq=44667350 Win=0 Len=0
15871   21.340217       127.0.0.1       49672   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true
 HTTP/1.1 
16330   21.652935       127.0.0.1       13563   127.0.0.1       49672   TCP     
44      13563 → 49672 [RST] Seq=4997533 Win=0 Len=0
16339   21.653976       127.0.0.1       49675   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000010_0_10005&keepAlive=true
 HTTP/1.1 
16949   21.963803       127.0.0.1       49675   127.0.0.1       13563   HTTP    
446     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true
 HTTP/1.1 
16950   21.963865       127.0.0.1       13563   127.0.0.1       49675   TCP     
44      13563 → 49675 [RST] Seq=7275696 Win=0 Len=0
16959   21.964488       127.0.0.1       49678   127.0.0.1       13563   HTTP    
446     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true
 HTTP/1.1 
17617   21.982604       127.0.0.1       13563   127.0.0.1       49678   TCP     
56      13563 → 49678 [FIN, ACK] Seq=17117476 Ack=391 Win=407904 Len=0 
TSval=881233239 TSecr=881233238
17622   21.988410       127.0.0.1       49678   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000009_0_10006&keepAlive=true
 HTTP/1.1 
17623   21.988474       127.0.0.1       13563   127.0.0.1       49678   TCP     
44      13563 → 49678 [RST] Seq=17117477 Win=0 Len=0
17628   21.989139       127.0.0.1       49679   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000009_0_10006&keepAlive=true
 HTTP/1.1 
18080   21.998431       127.0.0.1       13563   127.0.0.1       49679   TCP     
56      13563 → 49679 [FIN, ACK] Seq=8188579 Ack=344 Win=407936 Len=0 
TSval=881233253 TSecr=881233253
{noformat}





was (Author: jeagles):
[~rajesh.balamohan], I used the above logging technique to compare 
HttpURLConnection to what is happening at the network level. Below is a 
HttpURLConnection log and below it is a wireshark capture. You can see that the 
HttpURLLogging suggests that keep alive connections are being used but in 
reality the remote side issues a reset (RST) FIN and FIN, ACK were removed for 
brevity. You can see that when connection is attempted to be reused, the server 
resets the connection and the fetcher thread reattempts the request on a new 
connection. You can see the reattempts are new connections since they happen on 
a different port with the exactly the same request. The job is a 14:1:1 
OrderedWordCount job. My guess is that HttpURLConnection sees the keep-alive 
response header and is re-connecting silently behind the scenes. Do you have a 
way to get a network sniff of the above job to see if you are seeing the same 
behavior?

{noformat:title=HttpURLConnection logging}
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000004_0_10003,attempt_1487743755036_0018_1_00_000005_0_10003&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000004_0_10003,attempt_1487743755036_0018_1_00_000005_0_10003&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000010_0_10005&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000010_0_10005&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true
FINEST: KeepAlive stream retrieved from the cache, 
sun.net.www.http.HttpClient(http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true)
FINEST: KeepAlive stream used: 
http://localhost:13563/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000009_0_10006&keepAlive=true
{noformat}

{noformat:title=Wireshark capture}
No      Time            srcaddr          srcport        dstaddr dstport 
protocol        length  info
10665   19.595725       127.0.0.1       49670   127.0.0.1       13563   HTTP    
446     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000004_0_10003,attempt_1487743755036_0018_1_00_000005_0_10003&keepAlive=true
 HTTP/1.1 
11726   19.697864       127.0.0.1       49670   127.0.0.1       13563   HTTP    
587     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true
 HTTP/1.1 
11727   19.697939       127.0.0.1       13563   127.0.0.1       49670   TCP     
44      13563 → 49670 [RST] Seq=17506303 Win=0 Len=0
11736   19.700396       127.0.0.1       49671   127.0.0.1       13563   HTTP    
587     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000002_0_10003,attempt_1487743755036_0018_1_00_000000_0_10003,attempt_1487743755036_0018_1_00_000003_0_10003,attempt_1487743755036_0018_1_00_000006_0_10003,attempt_1487743755036_0018_1_00_000001_0_10003&keepAlive=true
 HTTP/1.1 
15865   21.339273       127.0.0.1       49671   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true
 HTTP/1.1 
15866   21.339350       127.0.0.1       13563   127.0.0.1       49671   TCP     
44      13563 → 49671 [RST] Seq=44667350 Win=0 Len=0
15871   21.340217       127.0.0.1       49672   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000011_0_10006&keepAlive=true
 HTTP/1.1 
16330   21.652935       127.0.0.1       13563   127.0.0.1       49672   TCP     
44      13563 → 49672 [RST] Seq=4997533 Win=0 Len=0
16339   21.653976       127.0.0.1       49675   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000010_0_10005&keepAlive=true
 HTTP/1.1 
16949   21.963803       127.0.0.1       49675   127.0.0.1       13563   HTTP    
446     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true
 HTTP/1.1 
16950   21.963865       127.0.0.1       13563   127.0.0.1       49675   TCP     
44      13563 → 49675 [RST] Seq=7275696 Win=0 Len=0
16959   21.964488       127.0.0.1       49678   127.0.0.1       13563   HTTP    
446     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000007_0_10006,attempt_1487743755036_0018_1_00_000008_0_10006&keepAlive=true
 HTTP/1.1 
17622   21.988410       127.0.0.1       49678   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000009_0_10006&keepAlive=true
 HTTP/1.1 
17623   21.988474       127.0.0.1       13563   127.0.0.1       49678   TCP     
44      13563 → 49678 [RST] Seq=17117477 Win=0 Len=0
17628   21.989139       127.0.0.1       49679   127.0.0.1       13563   HTTP    
399     GET 
/mapOutput?job=job_1487743755036_0018&dag=1&reduce=0&map=attempt_1487743755036_0018_1_00_000009_0_10006&keepAlive=true
 HTTP/1.1 
{noformat}




> Implement keep-alive timeout in tez shuffle handler
> ---------------------------------------------------
>
>                 Key: TEZ-3633
>                 URL: https://issues.apache.org/jira/browse/TEZ-3633
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3633.1.patch
>
>
> MAPREDUCE-5787 which added keep-alive to mapreduce shuffle handler was not 
> fully functional as despite advertising keep-alive option and adding the  
> header to the response, all connections were closed immediately after write. 
> This reduced the performance of certain fetches as now time is spent 
> requesting a second get to the same serve, only for that server to reset the 
> connection forcing the client to reestablish the connection on another port. 
> The details of this is hidden behind HttpURLConnection and doesn't show in 
> any log file at default logging level. However TCP sniffing does show errant 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to