Sudheer Vinukonda created TS-3085:
-------------------------------------
Summary: Large POSTs over (relatively) slower connections failing
in ats5
Key: TS-3085
URL: https://issues.apache.org/jira/browse/TS-3085
Project: Traffic Server
Issue Type: Bug
Components: SSL
Reporter: Sudheer Vinukonda
We ran into a production issue where large POSTs (30MB or high) are failing
over slower connection speeds after ats5 roll out (the problem could be easily
reproduced using a charles proxy with throttling enabled).
Further debugging isolated the issue to uploads over SSL connections and after
a lot of debugging the issue appears to be the below:
ATS calls SSL_read() followed by SSL_get_error() to check if there was any
error in the read. This is repeated until either the complete data is read or
an error occurs. However, from the openssl documentation, it is recommended to
call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure
the error queue is clean of any leftover/garbage errors. It's not clear what
might be corrupting the error queue of the SSL context in a tight loop -
possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a
good idea and adding this seems to resolve the post failures.
Documentation from openSSL and some related notes on stackoverflow:
https://www.openssl.org/docs/ssl/SSL_get_error.html
http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
{code}
"SSL_get_error() returns a result code (suitable for the C ``switch''
statement) for a preceding call to SSL_connect(), SSL_accept(),
SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value
returned by that TLS/SSL I/O function must be passed to SSL_get_error() in
parameter ret.
In addition to ssl and ret, SSL_get_error() inspects the current thread's
OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that
performed the TLS/SSL I/O operation, and no other OpenSSL function calls should
appear in between. The current thread's error queue must be empty before the
TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably."
"SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error,
the error stays in the queue.
You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write
etc) that is followed by SSL_get_error, otherwise you may be reading an old
error that occurred previously in the current thread."
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)