[ 
https://issues.apache.org/jira/browse/TS-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143535#comment-14143535
 ] 

Sudheer Vinukonda edited comment on TS-3085 at 9/22/14 6:17 PM:
----------------------------------------------------------------

Added wrappers to SSL I/O functions and removed errno based looping on 
SSL_Write per [[email protected]]'s recommendations.

Note that, this change removes the apparent logic of reattempting SSL_Write on 
transient error cases (such as ENOBUF). Not entirely sure if this logic is 
correct, since the openSSL documentation doesn't indicate that errno is set 
during SSL_Write. https://www.openssl.org/docs/ssl/SSL_write.html

However, it does look like there may be others trying to check for errno during 
SSL_Write (which was also again discouraged in some comments).

http://fixunix.com/openssl/487926-bug-ssl_error_ssl-eagain-ssl_write.html

http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/apis/sslwrite.html

"This is your problem. You are confusing yourself by checking 'errno'.
SSL_write does not set 'errno' to a useful value, so there is no reason to
check it."

"I've straced the process invoking SSL_write(). Every time I get that
protocol error from SSL_write(), I see that the last write() it invokes
returns -1, with errno set to EAGAIN.

When I send the message and do _not_ get the protocol error, I see that
either: 1) there is no EAGAIN or 2) the last write() sets EAGAIN and
SSL_write() returns SSL_ERROR_WANT_WRITE."


was (Author: sudheerv):
Added wrappers to SSL I/O functions and removed errno based looping on 
SSL_Write per [[email protected]]'s recommendations.

Note that, this change removes the apparent logic of reattempting SSL_Write on 
transient error cases (such as ENOBUF). Not entirely sure if this logic is 
correct, since the openSSL documentation doesn't indicate that errno is set 
during SSL_Write. https://www.openssl.org/docs/ssl/SSL_write.html

However, it does look like there may be others trying to check for errno during 
SSL_Write (which was also again discouraged in some comments).

http://fixunix.com/openssl/487926-bug-ssl_error_ssl-eagain-ssl_write.html

http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/apis/sslwrite.html

"This is your problem. You are confusing yourself by checking 'errno'.
SSL_write does not set 'errno' to a useful value, so there is no reason to
check it."

> Large POSTs over (relatively) slower connections failing in ats5
> ----------------------------------------------------------------
>
>                 Key: TS-3085
>                 URL: https://issues.apache.org/jira/browse/TS-3085
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: SSL
>    Affects Versions: 5.0.1
>            Reporter: Sudheer Vinukonda
>            Assignee: Sudheer Vinukonda
>              Labels: yahoo
>             Fix For: 5.2.0
>
>         Attachments: TS-3085.diff
>
>
> We ran into a production issue where large POSTs (30MB or high) are failing 
> over slower connection speeds after ats5 roll out (the problem could be 
> easily reproduced using a charles proxy with throttling enabled). 
> Further debugging isolated the issue to uploads over SSL connections and 
> after a lot of debugging the issue appears to be the below:
> ATS calls SSL_read() followed by SSL_get_error() to check if there was any 
> error in the read. This is repeated until either the complete data is read or 
> an error occurs. However, from the openssl documentation, it is recommended 
> to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to 
> ensure the error queue is clean of any leftover/garbage errors.  It's not 
> clear what might be corrupting the error queue of the SSL context in a tight 
> loop - possibly, some new feature in ats5. In any case, calling 
> ERR_clear_error() is a good idea and adding this seems to resolve the post 
> failures.
> Documentation from openSSL and some related notes on stackoverflow:
> https://www.openssl.org/docs/ssl/SSL_get_error.html
> http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error
> {code}
> "SSL_get_error() returns a result code (suitable for the C ``switch''
> statement) for a preceding call to SSL_connect(), SSL_accept(),
> SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value
> returned by that TLS/SSL I/O function must be passed to SSL_get_error() in
> parameter ret.
> In addition to ssl and ret, SSL_get_error() inspects the current thread's
> OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread 
> that
> performed the TLS/SSL I/O operation, and no other OpenSSL function calls 
> should
> appear in between. The current thread's error queue must be empty before the
> TLS/SSL I/O operation is attempted, or SSL_get_error() will not work 
> reliably."
> "SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error,
> the error stays in the queue.
> You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, 
> SSL_write
> etc) that is followed by SSL_get_error, otherwise you may be reading an old
> error that occurred previously in the current thread."
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to