[
https://issues.apache.org/jira/browse/TS-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478130#comment-15478130
]
James Peach commented on TS-4838:
---------------------------------
I was able to reproduce this. Thanks for bisecting this [~dim], that really
helped me narrow it down.
What happens here is that {{ProxyClientTransaction::adjust_thread}} reschedules
the transaction onto a new thread at the start of
{{HttpSM::do_http_server_open}}. At this point the stack looks like this:
{noformat}
[Sep 9 20:09:41.582] Server {0x2b74a450d700} ERROR:
<ProxyClientTransaction.cc:101 (adjust_thread)> adjusting thread for
0x2b74d03511b0
traffic_server - STACK TRACE:
/n/trafficserver/bin/traffic_server(_ZN6HttpSM19do_http_server_openEb+0x141)[0x63855b]
/n/trafficserver/bin/traffic_server(_ZN6HttpSM14set_next_stateEv+0x18a0)[0x6432b8]
/n/trafficserver/bin/traffic_server(_ZN6HttpSM32call_transact_and_set_next_stateEPFvPN12HttpTransact5StateEE+0x1ae)[0x641a10]
/n/trafficserver/bin/traffic_server(_ZN6HttpSM17handle_api_returnEv+0x102)[0x62c52a]
/n/trafficserver/bin/traffic_server(_ZN6HttpSM17state_api_calloutEiPv+0x91d)[0x62c30f]
/n/trafficserver/bin/traffic_server(_ZN6HttpSM18state_api_callbackEiPv+0x1b8)[0x62b97c]
/n/trafficserver/bin/traffic_server(TSHttpTxnReenable+0x146)[0x5343a7]
{noformat}
Unfortunately, at this point the default handler is
{{HttpSM::state_raw_http_server_open}}. When the transaction is rescheduled,
the default handler runs, and receives the {{EVENT_INTERVAL}} that it so
fortuitously logs an error for. We have never actually completed
{{do_http_server_open}}, so we never make any progress.
> After TS-3612 restructuring, very slow SSL sessions and
> HttpSM::state_raw_http_server_open errors
> -------------------------------------------------------------------------------------------------
>
> Key: TS-4838
> URL: https://issues.apache.org/jira/browse/TS-4838
> Project: Traffic Server
> Issue Type: Bug
> Components: Core, SSL
> Affects Versions: 6.2.0, 7.0.0
> Environment: CentOS/RHEL 7.2, x86_64
> Reporter: Dimitry Andric
>
> We have been using TrafficServer 5.3.2 for quite some time now, for forward
> proxying of a number of different HTML5 applications, one of the most
> important ones being YouTube's TV interface, e.g. https://youtube.com/tv.
> This is all hosted on CentOS 7.2 x86_64 machines.
> We recently upgraded to 6.2.0, and then started having problems with the
> CONNECT requests for port 443 which are generated by the YouTube app. It
> seems like these connections are "stalled" somehow, sometimes for >10
> seconds. Meanwhile, {{diags.log}} is getting spammed lots of the following:
> {noformat}
> [Sep 9 16:45:47.683] Server {0x2b3e50c0b700} ERROR:
> [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0
> server_entry: (nil)
> {noformat}
> Requests that seem to stall are most likely all of the CONNECT kind, e.g.:
> {noformat}
> 1473432382.474 30405 127.0.0.1 TCP_MISS/200 4916 CONNECT
> ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
> 1473432382.481 30411 127.0.0.1 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ -
> DIRECT/i9.ytimg.com -
> 1473432382.486 30417 127.0.0.1 TCP_MISS/200 5389 CONNECT
> pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
> 1473432390.451 42772 127.0.0.1 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/
> - DIRECT/csi.gstatic.com -
> 1473432390.459 43833 127.0.0.1 TCP_MISS/200 11610 CONNECT
> www.youtube.com:443/ - DIRECT/www.youtube.com -
> 1473432390.483 38414 127.0.0.1 TCP_MISS/200 2870983 CONNECT
> r17---sn-5hnednl7.googlevideo.com:443/ -
> DIRECT/r17---sn-5hnednl7.googlevideo.com -
> {noformat}
> As part of figuring out how to diagnose this, I tried a downgrade to
> TrafficServer 6.1.1, and this made all the stalling and problems disappear.
> Afterwards, I did a {{git bisect}} on master, from the branch point of 6.1 to
> the branch point of 6.2, and I ended up at [commit
> af76977|https://git-dual.apache.org/repos/asf?p=trafficserver.git;a=commit;h=af76977adb9f3c0296a232688bbcb5a1421a6768]:
> {quote}
> Author: Susan Hinrichs <[email protected]>
> Date: Wed Apr 13 19:57:39 2016 +0000
> TS-3612: Restructure client session and transaction processing. This
> closes #570.
> {quote}
> Unfortunately, this is a quite big refactoring commit, so it is not possible
> to revert it individually to see whether it improves things.
> I read TS-3612 and #570, and I saw there were also a number of follow-up
> commits to fix various problems with it, but this particular problem of
> stalled SSL connections is still occurring with master as of today,
> 2016-09-09.
> I realize that this report is still missing reproduction details, since it is
> tricky to analyze what the YouTube app is doing, and simple {{curl https://}}
> tests appear to go fast, and don't seem to trigger any stalling. But YouTube
> itself is pretty easy to try out, I think.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)