Csaba Ringhofer created IMPALA-13680:
----------------------------------------
Summary: Some SSL tests hang with OpenSsl 3.2 .RHEL 9.5
Key: IMPALA-13680
URL: https://issues.apache.org/jira/browse/IMPALA-13680
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Csaba Ringhofer
The thread that handles the connection can hang when trying to close the
transport after SSL handshake error.
The server hangs here:
https://github.com/apache/impala/blob/3118e41c26f730a06d42994e678cab694c787649/be/src/rpc/TAcceptQueueServer.cpp#L111
while the client in BE test KerberosOnAndOff hangs here:
https://github.com/apache/impala/blob/3118e41c26f730a06d42994e678cab694c787649/be/src/rpc/thrift-server-test.cc#L227
In this specific test the Thrift client tries to connect without SSL to a
server that has SSL enabled but not Kerberos. In this case opening the socket
is expected to be successful while actually sending data should return an
error, as the server expects SSL handshake while the client send unrelated
(Thrift) data. The result is hanging instead till the test is killed due to
timeout (2 hours).
The callstack on server side (based on the minidump) is:
{code}
apache::thrift::transport::TSSLSocket::waitForEvent(bool) [TSSLSocket.cpp :
881 + 0xa]
apache::thrift::transport::TSSLSocket::initializeHandshake() [TSSLSocket.cpp :
683 + 0x12]
apache::thrift::transport::TSSLSocket::flush() [TSSLSocket.cpp : 613 + 0x5]
apache::thrift::transport::TBufferedTransport::close() [TBufferTransports.cpp :
133 + 0x3]
apache::thrift::server::TAcceptQueueServer::Task::run() [TAcceptQueueServer.cpp
: 111 + 0x3]
{code}
Note that Impala uses Thrift 0.16.0 with a few patches so line may not match
with the original Thrift source code.
The hang seems to happen only when the failure is "SSL_accept: wrong version
number (SSL_error_code = 1)". There are several other tests that lead to failed
SSL_accept and finish correctly (e.g. "SSL_accept: unsupported protocol
(SSL_error_code = 1)" )
--
This message was sent by Atlassian Jira
(v8.20.10#820010)