Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

Rainer Jung Sun, 25 Nov 2018 01:42:47 -0800

I observed that when building tcnative against OpenSSL 1.1.1 I ran intohangs when talking TLS 1.0 with Tomcat trunk using that tcnative plusNio(2).

A simple "GET /" request eg. send with curl, hangs for 60 seconds aftera successful TLS handshake, then the client ends with an "empty replyfrom server".

You can also reproduce with openssl s_client. The request will hanguntil you send another additional empty line (in addition to the usualempty line ending the request). The additional one will then triggeranother read which will find the old request data and handle it.

The problem does not occur with the APR connector. APR and Nio(2) seemto use very different code paths in tcnative for TLS handling(sslnetwork.c versus ssl.c).

I have some understanding of the root cause but currently no good ideahow to fix it. The root cause is incorrect handling of SSL_read when itreturns "0". The OpenSSL man page has a relevant description at [1]. Asobserved also in mod_ssl (Apache web server), OpenSSL 1.1.1 behavesdifferent than older version in that it can return "0", were oldversions returned "-1". That was always documented as a possibility butin reality now really happens. The tcnative code used by APR handlesthis in the native part. The code used by Nio(2) simply returns thevalue it gets from SSL_read() and leaves it to the calling Java tohandle that. netty, from which we borrowed the ideas for Java plusOpenSSL, does include such code in ReferenceCountedOpenSslEngine.java,especially the SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE handling.

I could have experimented with their approach, but for some reason thereseems to be another problem that makes it harder. The relevant call toSSL_read() returns "0", but does not return WANT_READ or WANT_WRITE froma following SSL_get_error(), but instead "5", which isSSL_ERROR_SYSCALL. I do not have a good idea, where this comes from.When tracing system calls, it seems it comes from an EAGAIN in a socketread, but I am not sure about that.

In our Java code, what happens is a call to unwrap() in OpenSSLEngine.This call writes I think 146 bytes, then checkspendingReadableBytesInSSL(). That call in turn calls SSL.readFromSSL()and gets back "0" (from SSL_read()). Up in unwrap() we then skip thewhile loop and finally return with BUFFER_UNDERFLOW. Then we hang,probably because the data was read by OpenSSL and no more socket eventhappens. If I artificially add another call topendingReadableBytesInSSL() which triggers another SSL_read(), the hangdoes not occur.

IMHO TLS 1.0 is not such a big problem, but we should at least documentit when we do a new release.

I might drill down debugging into the native layer checking errno etc.but I am not sure I will find the time.


[1]: https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Current problems with TLS 1.0 and NIO(2)+native+openssl 1.1.1

Reply via email to