Dimitry Andric created TS-4838:
----------------------------------

             Summary: After TS-3612 restructuring, very slow SSL sessions and 
HttpSM::state_raw_http_server_open errors
                 Key: TS-4838
                 URL: https://issues.apache.org/jira/browse/TS-4838
             Project: Traffic Server
          Issue Type: Bug
          Components: Core, SSL
            Reporter: Dimitry Andric


We have been using TrafficServer 5.3.2 for quite some time now, for forward 
proxying of a number of different HTML5 applications, one of the most important 
ones being YouTube's TV interface, e.g. https://youtube.com/tv.  This is all 
hosted on CentOS 7.2 x86_64 machines.

We recently upgraded to 6.2.0, and then started having problems with the 
CONNECT requests for port 443 which are generated by the YouTube app.  It seems 
like these connections are "stalled" somehow, sometimes for >10 seconds.  
Meanwhile, {{diags.log}} is getting spammed lots of the following:
{noformat}
[Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: 
[HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 
server_entry: (nil)
{noformat}

Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
{noformat}
1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT 
ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - 
DIRECT/i9.ytimg.com -
1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT 
pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ - 
DIRECT/csi.gstatic.com -
1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT www.youtube.com:443/ 
- DIRECT/www.youtube.com -
1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT 
r17---sn-5hnednl7.googlevideo.com:443/ - 
DIRECT/r17---sn-5hnednl7.googlevideo.com -
{noformat}

As part of figuring out how to diagnose this, I tried a downgrade to 
TrafficServer 6.1.1, and this made all the stalling and problems disappear.  
Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to 
the branch point of 6.2, and I ended up at [commit 
af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
{quote}
Author: Susan Hinrichs <[email protected]>
Date:   Wed Apr 13 19:57:39 2016 +0000

    TS-3612: Restructure client session and transaction processing. This closes 
#570.
{quote}

Unfortunately, this is a quite big refactoring commit, so it is not possible to 
revert it individually to see whether it improves things.

I read TS-3612 and #570, and I saw there were also a number of follow-up 
commits to fix various problems with it, but this particular problem of stalled 
SSL connections is still occurring with master as of today, 2016-09-09.

I realize that this report is still missing reproduction details, since it is 
tricky to analyze what the YouTube app is doing, and simple {{curl https://}} 
tests appear to go fast, and don't seem to trigger any stalling.  But YouTube 
itself is pretty easy to try out, I think.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to