[ https://issues.apache.org/jira/browse/THRIFT-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914901#comment-17914901 ]
Csaba Ringhofer commented on THRIFT-5846: ----------------------------------------- >It would also make sense to skip flushing in TBufferedTransport::close() if >the underlying transport is not open. Thought a bit more about this - actually I think that TBufferedTransport::close() should not call flush at all. Checked TBufferedTransport for other languages and close() simply calls the close() of the underlying transport: c#: https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/netstd/Thrift/Transport/Layered/TBufferedTransport.cs#L79 go: https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/go/thrift/buffered_transport.go#L63 python: https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/py/src/transport/TTransport.py#L158 I would expect close() to be callable on a transport after error, but not flush(). TConnectedClient calls close() even after error, similarly to Impala's TAcceptQueueServer: https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/server/TConnectedClient.cpp#L100 Calling flush() is necessary when requesting/responding RPCs, and TProcessor implementations contain these flush() calls, so the only case when there should be unflushed data during cleanup is when there was an error/the connection was closed by the client. In these cases the client shouldn't expect more data and flush can only do harm. > Hang when closing TBufferedTransport + TSslTransport > ---------------------------------------------------- > > Key: THRIFT-5846 > URL: https://issues.apache.org/jira/browse/THRIFT-5846 > Project: Thrift > Issue Type: Bug > Components: C++ - Library > Reporter: Csaba Ringhofer > Priority: Major > > The issue occurred in Impala tests when switching to OpenSsl 3.2 > (IMPALA-13680). > While the root cause of the regression may be some change in OpenSsl, > Thrift's behavior seems also strange as it tries to do SSL handshake when > closing the transport: > {code} > apache::thrift::transport::TSSLSocket::waitForEvent(bool) [TSSLSocket.cpp : > 881 + 0xa] > apache::thrift::transport::TSSLSocket::initializeHandshake() [TSSLSocket.cpp > : 683 + 0x12] > apache::thrift::transport::TSSLSocket::flush() [TSSLSocket.cpp : 613 + 0x5] > apache::thrift::transport::TBufferedTransport::close() [TBufferTransports.cpp > : 133 + 0x3] > apache::thrift::server::TAcceptQueueServer::Task::run() > [TAcceptQueueServer.cpp : 111 + 0x3] > {code} > Note that Impala uses Thrift 0.16.0 with a few patches so line numbers may > not match with the original Thrift source code, but the Impala changes do not > seem relevant in this case. > The timeline of the events is: > 1. TSSLSocket wrapped in TBufferedTransport is created and opened > 2. the first TSSLSocket::initializeHandshake() fails and leads to throwing an > TTransportException with "SSL_accept: wrong version number (SSL_error_code = > 1)" > 3. Impala tries to close the transport > 4. as the transport is TBufferedTransport, it calls flush on close() > https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TBufferTransports.h#L236 > 5. TBufferedTransport::flush() calls the underlying transport's flush() > https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TBufferTransports.cpp#L134 > 6. TSSlSocket::flush() calls initializeHandshake() > https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L586 > The hang only happens with the "wrong version number" error, other SSL_accept > errors were handled as before (e.g. "unsupported protocol"). > My expectation is that if the transport's open() was successful then calling > close() should be safe, even if a later function throws an exception. In case > of TSSLSocket open() just opens the socket and the first read()/write() will > do the actual SSL handshake. If the handshake fails then the TSSLSocket > enters a strange state where isOpen() returns true but the ssl_ object is not > usable. > I think that a possible solution is to shutdown+free ssl_ if the handshake > failed: > https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L698 > This means that isOpen() would return false after the error + > TSSLSocket::flush() would not try to reestablish connection. It would also > make sense to skip flushing in TBufferedTransport::close() if the underlying > transport is not open. -- This message was sent by Atlassian Jira (v8.20.10#820010)