[ https://issues.apache.org/jira/browse/TS-4838?focusedWorklogId=31730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-31730 ]
ASF GitHub Bot logged work on TS-4838: -------------------------------------- Author: ASF GitHub Bot Created on: 07/Nov/16 17:16 Start Date: 07/Nov/16 17:16 Worklog Time Spent: 10m Work Description: GitHub user PSUdaemon opened a pull request: https://github.com/apache/trafficserver/pull/1206 TS-4838: CONNECT requests get forgotten across threads. What happens here is that ProxyClientTransaction::adjust_thread reschedules the transaction onto a new thread at the start of HttpSM::do_http_server_open. Unfortunately, at this point the default handler is HttpSM::state_raw_http_server_open. When the transaction is rescheduled, the default handler runs, and receives the EVENT_INTERVAL that it so fortuitously logs an error for. We have never actually completed do_http_server_open, so we never make any more progress on this transaction. (cherry picked from commit 8fddd77c085d1a64f11de61bb42a50562cd23229) You can merge this pull request into a Git repository by running: $ git pull https://github.com/PSUdaemon/trafficserver bp-1002 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/trafficserver/pull/1206.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1206 ---- commit c5ab2e686ac0dad4ebe89573cdcc0b2d2a6359a4 Author: James Peach <jpe...@apache.org> Date: 2016-09-09T22:29:05Z TS-4838: CONNECT requests get forgotten across threads. What happens here is that ProxyClientTransaction::adjust_thread reschedules the transaction onto a new thread at the start of HttpSM::do_http_server_open. Unfortunately, at this point the default handler is HttpSM::state_raw_http_server_open. When the transaction is rescheduled, the default handler runs, and receives the EVENT_INTERVAL that it so fortuitously logs an error for. We have never actually completed do_http_server_open, so we never make any more progress on this transaction. (cherry picked from commit 8fddd77c085d1a64f11de61bb42a50562cd23229) ---- Issue Time Tracking ------------------- Worklog Id: (was: 31730) Time Spent: 1h (was: 50m) > After TS-3612 restructuring, very slow SSL sessions and > HttpSM::state_raw_http_server_open errors > ------------------------------------------------------------------------------------------------- > > Key: TS-4838 > URL: https://issues.apache.org/jira/browse/TS-4838 > Project: Traffic Server > Issue Type: Bug > Components: Core, SSL > Affects Versions: 6.2.0, 7.0.0 > Environment: CentOS/RHEL 7.2, x86_64 > Reporter: Dimitry Andric > Assignee: James Peach > Fix For: 7.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > We have been using TrafficServer 5.3.2 for quite some time now, for forward > proxying of a number of different HTML5 applications, one of the most > important ones being YouTube's TV interface, e.g. https://youtube.com/tv. > This is all hosted on CentOS 7.2 x86_64 machines. > We recently upgraded to 6.2.0, and then started having problems with the > CONNECT requests for port 443 which are generated by the YouTube app. It > seems like these connections are "stalled" somehow, sometimes for >10 > seconds. Meanwhile, {{diags.log}} is getting spammed lots of the following: > {noformat} > [Sep 9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: > [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 > server_entry: (nil) > {noformat} > Requests that seem to stall are most likely all of the CONNECT kind, e.g.: > {noformat} > 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT > ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net - > 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - > DIRECT/i9.ytimg.com - > 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT > pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com - > 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ > - DIRECT/csi.gstatic.com - > 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT > www.youtube.com:443/ - DIRECT/www.youtube.com - > 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT > r17---sn-5hnednl7.googlevideo.com:443/ - > DIRECT/r17---sn-5hnednl7.googlevideo.com - > {noformat} > As part of figuring out how to diagnose this, I tried a downgrade to > TrafficServer 6.1.1, and this made all the stalling and problems disappear. > Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to > the branch point of 6.2, and I ended up at [commit > af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]: > {quote} > Author: Susan Hinrichs <shinr...@draggingnagging.corp.ne1.yahoo.com> > Date: Wed Apr 13 19:57:39 2016 +0000 > TS-3612: Restructure client session and transaction processing. This > closes #570. > {quote} > Unfortunately, this is a quite big refactoring commit, so it is not possible > to revert it individually to see whether it improves things. > I read TS-3612 and #570, and I saw there were also a number of follow-up > commits to fix various problems with it, but this particular problem of > stalled SSL connections is still occurring with master as of today, > 2016-09-09. > I realize that this report is still missing reproduction details, since it is > tricky to analyze what the YouTube app is doing, and simple {{curl https://}} > tests appear to go fast, and don't seem to trigger any stalling. But YouTube > itself is pretty easy to try out, I think. -- This message was sent by Atlassian JIRA (v6.3.4#6332)