[jira] [Work logged] (TS-4509) Dropped keep-alive connections not being re-established (TS-3959 continued)

ASF GitHub Bot (JIRA) Mon, 03 Oct 2016 09:11:35 -0700

     [ 
https://issues.apache.org/jira/browse/TS-4509?focusedWorklogId=30078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30078
 ]


ASF GitHub Bot logged work on TS-4509:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Oct/16 16:11
            Start Date: 03/Oct/16 16:11
    Worklog Time Spent: 10m 
      Work Description: Github user jpeach commented on a diff in the pull 
request:

    https://github.com/apache/trafficserver/pull/1070#discussion_r81578713
  
    --- Diff: proxy/http/HttpTransact.cc ---
    @@ -6536,14 +6536,23 @@ HttpTransact::is_request_valid(State *s, HTTPHdr 
*incoming_request)
     // In the general case once bytes have been sent on the wire the request 
cannot be retried.
     // The reason we cannot retry is that the rfc2616 does not make any 
gaurantees about the
     // retry-ability of a request. In fact in the reverse proxy case it is 
quite common for GET
    -// requests on the origin to fire tracking events etc. So, as a proxy once 
we have sent bytes
    -// on the wire to the server we cannot gaurantee that the request is safe 
to redispatch to another server.
    +// requests on the origin to fire tracking events etc. So, as a proxy once 
bytes have been ACKd
    --- End diff --
    
    So, as a proxy, once ...


Issue Time Tracking
-------------------

    Worklog Id:     (was: 30078)
    Time Spent: 1h  (was: 50m)

> Dropped keep-alive connections not being re-established (TS-3959 continued)
> ---------------------------------------------------------------------------
>
>                 Key: TS-4509
>                 URL: https://issues.apache.org/jira/browse/TS-4509
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core, Network
>            Reporter: Thomas Jackson
>            Assignee: Thomas Jackson
>            Priority: Blocker
>             Fix For: 7.1.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> I've observed some differences in how TrafficServer 6.0.0 behaves with 
> connection retrying and outgoing keep-alive connections. I believe the 
> changes in behavior might be related to this issue: 
> https://issues.apache.org/jira/browse/TS-3440
> I originally wasn't sure if this was a bug, but James Peach indicated it 
> sounded more like a regression on the mailing list 
> (http://mail-archives.apache.org/mod_mbox/trafficserver-users/201510.mbox/%[email protected]%3e).
> What I'm seeing in 6.0.0 is that if TrafficServer has some backend keep-alive 
> connections already opened, but then one of the keep-alive connections is 
> closed, the next request to TrafficServer may generate a 502 Server Hangup 
> response when attempting to reuse that connection. Previously, I think 
> TrafficServer was retrying when it encountered a closed keep-alive 
> connection, but that is no longer the case. So if you have a backend that 
> might unexpectedly close its open keep-alive connections, the only way I've 
> found to completely prevent these 502 errors in 6.0.0 is to disable outgoing 
> keepalive (proxy.config.http.keep_alive_enabled_out and 
> proxy.config.http.keep_alive_post_out settings).
> For a slightly more concrete example of what can trigger this, this is fairly 
> easy to reproduce with the following setup:
> - TrafficServer is proxying to nginx with outgoing keep-alive connections 
> enabled (the default).
> - Throw a constant stream of requests at TrafficServer.
> - While that constant stream of requests is happening, also send a regular 
> stream of SIGHUP commands to nginx to reload nginx.
> - Eventually you'll get some 502 Server Hangup responses from TrafficServer 
> among your stream of requests.
> SIGHUPs in nginx should result in zero downtime for new requests, but I think 
> what's happening is that TrafficServer may fail when an old keep-alived 
> connection is reused (it's not common, so it depends on the timing of things 
> and if the connection is from an old nginx worker that has since been shut 
> down). In TrafficServer 5.3.1 these connection failures were retried, but in 
> 6.0.0, no retries occur in this case.
> Here's some debug logs that show the difference in behavior between 6.0.0 and 
> 5.3.1. Note that differences seem to stem from how each version eventually 
> handles the "VC_EVENT_EOS" event following 
> "&HttpSM::state_send_server_request_header, VC_EVENT_WRITE_COMPLETE".
> 5.3.1: 
> https://gist.github.com/GUI/0c53a6c4fdc2782b14aa#file-trafficserver_5-3-1-log-L316
> 6.0.0: 
> https://gist.github.com/GUI/0c53a6c4fdc2782b14aa#file-trafficserver_6-0-0-log-L314
> Interestingly, if I'm understand the log files correctly, it looks like 
> TraffficServer is reporting an odd empty response from these connections 
> ("HTTP/0.9 0" in 5.3.1 and "HTTP/1.0 0" in 6.0.0). However, as far as I can 
> tell from TCP dumps on the system, nginx is not actually sending any form of 
> response.
> In these example cases the backend server isn't sending back any data (at 
> least as far as I can tell), so from what I understand (and the logic 
> outlined in https://issues.apache.org/jira/browse/TS-3440), it should be safe 
> to retry.
> Let me know if I can provide any other details. Or if exact scripts to 
> reproduce the issues against the example nginx backend I described above 
> would be useful, I could get that together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work logged] (TS-4509) Dropped keep-alive connections not being re-established (TS-3959 continued)

Reply via email to