[
https://issues.apache.org/jira/browse/TS-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602535#comment-14602535
]
Sudheer Vinukonda edited comment on TS-3714 at 6/26/15 5:54 PM:
----------------------------------------------------------------
After spending a long and hard week analyzing the issue, I think I finally
understand the problem. It turns out there are multiple issues causing this
behavior and afaict, this is a pretty fatal/serious problem that has impacted
ATS's performance severely.
1. For HTTP (non-ssl), the registration of epoll events (edge triggered mode)
is slightly late and could cause the events to be missed/delayed if there is
already data sitting in the socket right after accept. This is only a problem
in the path where accept threads are used (specifically,
*UnixNetVConnection::acceptEvent()*), which explains/validates the earlier
observations from [~yzlai] and our operations team. I've managed to reproduce
the race condition and validated that the fix resolves this issue correctly.
2. Now for the more serious issues on the HTTPS (SSL) side:
a). OpenSSL's implementation of SSL_read() is not entirely identical to the
traditional read() where, you could rely on select()/epoll() for subsequent
reads. On a normal select/epoll call, once the socket buffers are passed over
to openSSL (by calling SSL_read) for decryption, openSSL would need to wait
until the entire TLS record is received to be able to decrypt it (refer also
TS-2503 for more details). Consequently, the SSL_read() would not return all of
the data in one go and would return whatever's buffered at that point. One
would need to use *SSL_pending()* API to check if there is pending data to be
read after SSL_read(). The issue is that, there's no select/epoll events for
the pending data.
b). TS does have a loop with SSL_read as long as SSL_get_error() returns no
error and this works fine for normal case. However, the issue is right after
SSL handshake, TS currently transfers control to the
SSLNextProtocolTrampoline() object to identify the protocol and the appropriate
acceptor objects based on the protocol selected. When doing this, if there was
unread data right after handshake, that data is "lost", since there's no new
epoll event on it until a subsequent i/o which could be really far off (I've
seen as much as 150 sec+).
The solution to this problem is to basically read the remainder of the
(pending) data before passing the control to the trampoline object, which
triggers the acceptor eventually. It's a little bit tricky, since the iobuf is
presently owned/created by the acceptor object directly (or the
protocolprobetrampoline for probe scenarios), whereas this requires possibly,
SSLNetVC to own the buffer and pass it down to the acceptor objects.
c). Additionally, there's no inactivity timeout during SSL handshake atm and
I've seen it just drag along for very long delay (up to 2 min even) to complete
the handshake. This is almost pointless, so, will add a new configurable
handshake timer.
d). Further, I'd like to reduce the default for
*proxy.config.http.accept_no_activity_timeout* to a much lower value than 120
seconds - say, 10 seconds. The accept inactivity timer currently only starts
*after* SSL handshake for TLS connections, hence the need for an additional
timer for handshake. It seems better to me to have a separate SSL handshake
timer (which could therefore be configured differently to (likely, lower than)
the accept no activity timer), but, I'm open to other suggestions.
I've a proof of concept patch with the above changes and that seems to reduce
the long tail requests considerably. Would need more time to write a proper
patch and get it through sufficient testing etc.
was (Author: sudheerv):
After spending a long and hard week analyzing the issue, I think I finally
understand the problem. It turns out there are multiple issues causing this
behavior and afaict, this is a pretty fatal/serious problem that has impacted
ATS's performance severely.
1. For HTTP (non-ssl), the registration of epoll events (edge triggered mode)
is slightly late and could cause the events to be missed/delayed if there is
already data sitting in the socket right after accept. This is only a problem
in the path where accept threads are used (specifically,
*UnixNetVConnection::acceptEvent()*), which explains/validates the earlier
observations from [~yzlai] and our operations team. I've managed to reproduce
the race condition and validated that the fix resolves this issue correctly.
2. Now for the more serious issues on the HTTPS (SSL) side:
a). OpenSSL's implementation of SSL_read() is not entirely identical to the
traditional read() where, you could rely on select()/epoll() for subsequent
reads. On a normal select/epoll call, once the socket buffers are passed over
to openSSL (by calling SSL_read) for decryption, openSSL would need to wait
until the entire TLS record is received to be able to decrypt it (refer also
TS-2503 for more details). Consequently, the SSL_read() would not return all of
the data in one go and would return whatever's buffered at that point. One
would need to use *SSL_pending()* API to check if there is pending data to be
read after SSL_read(). The issue is that, there's no select/epoll events for
the pending data.
b). TS does have a loop with SSL_read as long as SSL_get_error() returns no
error and this works fine for normal case. However, the issue is right after
SSL handshake, TS currently transfers control to the
SSLNextProtocolTrampoline() object to identify the protocol and the appropriate
acceptor objects based on the protocol selected. When doing this, if there was
unread data right after handshake, that data is "lost", since there's no new
epoll event on it until a subsequent i/o which could be really far off (I've
seen as much as 150 sec+).
The solution to this problem is to basically read the remainder of the
(pending) data before passing the control to the trampoline object, which
triggers the acceptor eventually. It's a little bit tricky, since the iobuf is
presently owned/created by the acceptor object directly (or the
protocolprobetrampoline for probe scenarios), whereas this requires possibly,
SSLNetVC to own the buffer and pass it down to the acceptor objects.
c). Additionally, there's no inactivity timeout during SSL handshake atm and
I've seen it just drag along for very long delay (up to 2 min even) to complete
the handshake. This is almost pointless, so, will add a new configurable
handshake timer.
I've a proof of concept patch with the above changes and that seems to reduce
the long tail requests considerably. Would need more time to write a proper
patch and get it through sufficient testing etc.
> TS seems to stall between ua_begin and ua_first_read on some transactions
> resulting in high response times.
> -----------------------------------------------------------------------------------------------------------
>
> Key: TS-3714
> URL: https://issues.apache.org/jira/browse/TS-3714
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Affects Versions: 5.3.0
> Reporter: Sudheer Vinukonda
> Assignee: Sudheer Vinukonda
>
> An example slow log showing very high *ua_first_read*.
> {code}
> ERROR: [8624075] Slow Request: client_ip: xx.xx.xx.xxx url:
> http://xxxxxxxxxxxxxxxxxx.... status: 200 unique id: bytes: 86 fd: 0 client
> state: 0 serv
> er state: 9 ua_begin: 0.000 ua_first_read: 42.224 ua_read_header_done: 42.224
> cache_open_rea
> d_begin: 42.224 cache_open_read_end: 42.224 dns_lookup_begin: 42.224
> dns_lookup_end: 42.224
> server_connect: 42.224 server_first_read: 42.229 server_read_header_done:
> 42.229 server_clos
> e: 42.229 ua_begin_write: 42.229 ua_close: 42.229 sm_finish: 42.229
> {code}
> Initially, I suspected that it might be caused by browser's connecting early
> before sending any bytes to TS. However, this seems to be happening too
> frequently and with unrealistically high delay between *ua_begin* and
> *ua_first_read*.
> I suspect it's caused due to the code that disables the read temporarily
> before calling *TXN_START* hook and re-enables it after the API call out. The
> read disable is done via a 0-byte *do_io_read* on the client vc, but, the
> problem is that a valid *mbuf* is still passed. Based on what I am seeing,
> it's possible this results in actually enabling the *read_vio* all the way
> uptil *ssl_read_from_net* for instance (if there's a race condition and there
> were bytes already from the client resulting in an epoll read event), which
> would finally disable the read since, the *ntodo* (nbytes) is 0. However,
> this may result in the epoll event being lost until a new i/o happens from
> the client. I'm trying out further experiments to confirm the theory. In most
> cases, the read buffer already has some bytes by the time the http session
> and http sm is created, which makes it just work. But, if there's a slight
> delay in the receiving bytes after making a connection, the epoll mechanism
> should read it, but, due to the way the temporary read disable is being done,
> the event may be lost (this is coz, ATS uses the epoll edge triggered mode).
> Some history on this issue -
> This issue has been a problem for a long time and affects both http and https
> requests. When this issue was first reported, our router operations team
> eventually closed it indicating that disabling *accept* threads resolved it
> ([~yzlai] also reported similar observations and conclusions). It's possible
> that the race condition window may be slightly reduced by disabling accept
> threads, but, to the overall performance and through put, accept threads are
> very important. I now notice that the issue still exists (regardless of
> whether or not accept threads are enabled/disabled) and am testing further to
> confirm the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)