Csaba Ringhofer created IMPALA-13680:
----------------------------------------

             Summary: Some SSL tests hang with OpenSsl 3.2 .RHEL 9.5
                 Key: IMPALA-13680
                 URL: https://issues.apache.org/jira/browse/IMPALA-13680
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Csaba Ringhofer


The thread that handles the connection can hang when trying to close the 
transport after SSL handshake error.
The server hangs here:
https://github.com/apache/impala/blob/3118e41c26f730a06d42994e678cab694c787649/be/src/rpc/TAcceptQueueServer.cpp#L111
while the client in BE test KerberosOnAndOff hangs here:
https://github.com/apache/impala/blob/3118e41c26f730a06d42994e678cab694c787649/be/src/rpc/thrift-server-test.cc#L227

In this specific test the Thrift client tries to connect without SSL to a 
server that has SSL enabled but not Kerberos. In this case opening the socket 
is expected to be successful while actually sending data should return an 
error, as the server expects SSL handshake while the client send unrelated 
(Thrift) data. The result is hanging instead till the test is killed due to 
timeout (2 hours).

The callstack on server side (based on the minidump) is:
{code}
apache::thrift::transport::TSSLSocket::waitForEvent(bool)  [TSSLSocket.cpp : 
881 + 0xa]
apache::thrift::transport::TSSLSocket::initializeHandshake() [TSSLSocket.cpp : 
683 + 0x12]
apache::thrift::transport::TSSLSocket::flush() [TSSLSocket.cpp : 613 + 0x5]
apache::thrift::transport::TBufferedTransport::close() [TBufferTransports.cpp : 
133 + 0x3]
apache::thrift::server::TAcceptQueueServer::Task::run() [TAcceptQueueServer.cpp 
: 111 + 0x3]
{code}

Note that Impala uses Thrift 0.16.0 with a few patches so line may not match 
with the original Thrift source code.

The hang seems to happen only when the failure is "SSL_accept: wrong version 
number (SSL_error_code = 1)". There are several other tests that lead to failed 
SSL_accept and finish correctly (e.g.  "SSL_accept: unsupported protocol 
(SSL_error_code = 1)" )




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to