[
https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139224#comment-14139224
]
Sudheer Vinukonda commented on TS-3085:
---------------------------------------
When a POST fails, below is the log (slightly enhanced and traced using
single/ip debugging in production):
{code}
[Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (ssl)
[SSL_NetVConnection::ssl_read_from_net] b->write_avail()=32768
[Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (ssl)
[SSL_NetVConnection::ssl_read_from_net] rres=-1
[Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (ssl.error)
[SSL_NetVConnection::ssl_read_from_net] error 1
[Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (http_tunnel) [510166]
producer_handler [user agent post VC_EVENT_ERROR]
[Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (http_redirect)
[HttpTunnel::producer_handler] enable_redirection: [1 0 0] event: 3
[Sep 18 17:09:26.382] Server {0x2ab554605700} DEBUG: (http) [510166]
[&HttpSM::tunnel_handler_post_ua, VC_EVENT_ERROR]
{code}
> Large POSTs over (relatively) slower connections failing in ats5
> ----------------------------------------------------------------
>
> Key: TS-3085
> URL: https://issues.apache.org/jira/browse/TS-3085
> Project: Traffic Server
> Issue Type: Bug
> Components: SSL
> Affects Versions: 5.0.1
> Reporter: Sudheer Vinukonda
> Assignee: Sudheer Vinukonda
> Labels: yahoo
> Fix For: 5.2.0
>
>
> We ran into a production issue where large POSTs (30MB or high) are failing
> over slower connection speeds after ats5 roll out (the problem could be
> easily reproduced using a charles proxy with throttling enabled).
> Further debugging isolated the issue to uploads over SSL connections and
> after a lot of debugging the issue appears to be the below:
> ATS calls SSL_read() followed by SSL_get_error() to check if there was any
> error in the read. This is repeated until either the complete data is read or
> an error occurs. However, from the openssl documentation, it is recommended
> to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to
> ensure the error queue is clean of any leftover/garbage errors. It's not
> clear what might be corrupting the error queue of the SSL context in a tight
> loop - possibly, some new feature in ats5. In any case, calling
> ERR_clear_error() is a good idea and adding this seems to resolve the post
> failures.
> Documentation from openSSL and some related notes on stackoverflow:
> https://www.openssl.org/docs/ssl/SSL_get_error.html
> http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
> {code}
> "SSL_get_error() returns a result code (suitable for the C ``switch''
> statement) for a preceding call to SSL_connect(), SSL_accept(),
> SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value
> returned by that TLS/SSL I/O function must be passed to SSL_get_error() in
> parameter ret.
> In addition to ssl and ret, SSL_get_error() inspects the current thread's
> OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread
> that
> performed the TLS/SSL I/O operation, and no other OpenSSL function calls
> should
> appear in between. The current thread's error queue must be empty before the
> TLS/SSL I/O operation is attempted, or SSL_get_error() will not work
> reliably."
> "SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error,
> the error stays in the queue.
> You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read,
> SSL_write
> etc) that is followed by SSL_get_error, otherwise you may be reading an old
> error that occurred previously in the current thread."
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)