I observed that when building tcnative against OpenSSL 1.1.1 I ran into
hangs when talking TLS 1.0 with Tomcat trunk using that tcnative plus
Nio(2).
A simple "GET /" request eg. send with curl, hangs for 60 seconds after
a successful TLS handshake, then the client ends with an "empty reply
from server".
You can also reproduce with openssl s_client. The request will hang
until you send another additional empty line (in addition to the usual
empty line ending the request). The additional one will then trigger
another read which will find the old request data and handle it.
The problem does not occur with the APR connector. APR and Nio(2) seem
to use very different code paths in tcnative for TLS handling
(sslnetwork.c versus ssl.c).
I have some understanding of the root cause but currently no good idea
how to fix it. The root cause is incorrect handling of SSL_read when it
returns "0". The OpenSSL man page has a relevant description at [1]. As
observed also in mod_ssl (Apache web server), OpenSSL 1.1.1 behaves
different than older version in that it can return "0", were old
versions returned "-1". That was always documented as a possibility but
in reality now really happens. The tcnative code used by APR handles
this in the native part. The code used by Nio(2) simply returns the
value it gets from SSL_read() and leaves it to the calling Java to
handle that. netty, from which we borrowed the ideas for Java plus
OpenSSL, does include such code in ReferenceCountedOpenSslEngine.java,
especially the SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE handling.
I could have experimented with their approach, but for some reason there
seems to be another problem that makes it harder. The relevant call to
SSL_read() returns "0", but does not return WANT_READ or WANT_WRITE from
a following SSL_get_error(), but instead "5", which is
SSL_ERROR_SYSCALL. I do not have a good idea, where this comes from.
When tracing system calls, it seems it comes from an EAGAIN in a socket
read, but I am not sure about that.
In our Java code, what happens is a call to unwrap() in OpenSSLEngine.
This call writes I think 146 bytes, then checks
pendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL()
and gets back "0" (from SSL_read()). Up in unwrap() we then skip the
while loop and finally return with BUFFER_UNDERFLOW. Then we hang,
probably because the data was read by OpenSSL and no more socket event
happens. If I artificially add another call to
pendingReadableBytesInSSL() which triggers another SSL_read(), the hang
does not occur.
IMHO TLS 1.0 is not such a big problem, but we should at least document
it when we do a new release.
I might drill down debugging into the native layer checking errno etc.
but I am not sure I will find the time.
[1]: https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org