[ 
https://issues.apache.org/jira/browse/IMPALA-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17921680#comment-17921680
 ] 

ASF subversion and git services commented on IMPALA-13680:
----------------------------------------------------------

Commit b1a985be5eb49db6f23912a1439eeb59d74a278e in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b1a985be5 ]

IMPALA-13680: Avoid flush() when closing TSSLSocket

Closing the transports could hang in TAcceptQueueServer if there was
an error during SSL handshake. As the TSSLSocket is wrapped in
TBufferedTransport and TBufferedTransport::close() calls flush(),
TSSLSocket::flush() was also called that led to trying again the
handshake in an unclean state. This led to hanging indefinitely with
OpenSSL 3.2. Another potential error is that if flush throws an
exception then the underlying TTransport's close() wont' be called.

Ideally this would be solved in Thrift (THRIFT-5846). As quick
fix this change adds a subclass for TBufferedTransport that doesn't
call flush(). This is safe to do as generated TProcessor
subclasses call flush() every time when the client/server sends
a message.

Testing:
- the issue was caught by thrift-server-test/KerberosOnAndOff
  and TestClientSsl::test_ssl hanging till killed

Change-Id: I4879a1567f7691711d73287269bf87f2946e75d2
Reviewed-on: http://gerrit.cloudera.org:8080/22368
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>


> Some SSL tests hang with OpenSsl 3.2 .RHEL 9.5
> ----------------------------------------------
>
>                 Key: IMPALA-13680
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13680
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Priority: Critical
>
> The thread that handles the connection can hang when trying to close the 
> transport after SSL handshake error.
> The server hangs here:
> https://github.com/apache/impala/blob/3118e41c26f730a06d42994e678cab694c787649/be/src/rpc/TAcceptQueueServer.cpp#L111
> while the client in BE test KerberosOnAndOff hangs here:
> https://github.com/apache/impala/blob/3118e41c26f730a06d42994e678cab694c787649/be/src/rpc/thrift-server-test.cc#L227
> In this specific test the Thrift client tries to connect without SSL to a 
> server that has SSL enabled but not Kerberos. In this case opening the socket 
> is expected to be successful while actually sending data should return an 
> error, as the server expects SSL handshake while the client send unrelated 
> (Thrift) data. The result is hanging instead till the test is killed due to 
> timeout (2 hours).
> The callstack on server side (based on the minidump) is:
> {code}
> apache::thrift::transport::TSSLSocket::waitForEvent(bool)  [TSSLSocket.cpp : 
> 881 + 0xa]
> apache::thrift::transport::TSSLSocket::initializeHandshake() [TSSLSocket.cpp 
> : 683 + 0x12]
> apache::thrift::transport::TSSLSocket::flush() [TSSLSocket.cpp : 613 + 0x5]
> apache::thrift::transport::TBufferedTransport::close() [TBufferTransports.cpp 
> : 133 + 0x3]
> apache::thrift::server::TAcceptQueueServer::Task::run() 
> [TAcceptQueueServer.cpp : 111 + 0x3]
> {code}
> Note that Impala uses Thrift 0.16.0 with a few patches so line may not match 
> with the original Thrift source code.
> The hang seems to happen only when the failure is "SSL_accept: wrong version 
> number (SSL_error_code = 1)". There are several other tests that lead to 
> failed SSL_accept and finish correctly (e.g.  "SSL_accept: unsupported 
> protocol (SSL_error_code = 1)" )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to