[ 
https://issues.apache.org/jira/browse/THRIFT-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914901#comment-17914901
 ] 

Csaba Ringhofer commented on THRIFT-5846:
-----------------------------------------

>It would also make sense to skip flushing in TBufferedTransport::close() if 
>the underlying transport is not open.
Thought a bit more about this - actually I think that 
TBufferedTransport::close() should not call flush at all.
Checked TBufferedTransport for other languages and close() simply calls the 
close() of the underlying transport:
c#: 
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/netstd/Thrift/Transport/Layered/TBufferedTransport.cs#L79
go: 
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/go/thrift/buffered_transport.go#L63
python: 
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/py/src/transport/TTransport.py#L158

I would expect close() to be callable on a transport after error, but not 
flush(). TConnectedClient calls close() even after error, similarly to Impala's 
TAcceptQueueServer:
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/server/TConnectedClient.cpp#L100

Calling flush() is necessary when requesting/responding RPCs, and TProcessor 
implementations contain these flush() calls, so the only case when there should 
be unflushed data during cleanup is when there was an error/the connection was 
closed by the client. In these cases the client shouldn't expect more data and 
flush can only do harm.

> Hang when closing TBufferedTransport + TSslTransport
> ----------------------------------------------------
>
>                 Key: THRIFT-5846
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5846
>             Project: Thrift
>          Issue Type: Bug
>          Components: C++ - Library
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> The issue occurred in Impala tests when switching to OpenSsl 3.2 
> (IMPALA-13680).
> While the root cause of the regression may be some change in OpenSsl, 
> Thrift's behavior seems also strange as it tries to do SSL handshake when 
> closing the transport:
> {code}
> apache::thrift::transport::TSSLSocket::waitForEvent(bool)  [TSSLSocket.cpp : 
> 881 + 0xa]
> apache::thrift::transport::TSSLSocket::initializeHandshake() [TSSLSocket.cpp 
> : 683 + 0x12]
> apache::thrift::transport::TSSLSocket::flush() [TSSLSocket.cpp : 613 + 0x5]
> apache::thrift::transport::TBufferedTransport::close() [TBufferTransports.cpp 
> : 133 + 0x3]
> apache::thrift::server::TAcceptQueueServer::Task::run() 
> [TAcceptQueueServer.cpp : 111 + 0x3]
> {code}
> Note that Impala uses Thrift 0.16.0 with a few patches so line numbers may 
> not match with the original Thrift source code, but the Impala changes do not 
> seem relevant in this case.
> The timeline of the events is:
> 1. TSSLSocket wrapped in TBufferedTransport is created and opened
> 2. the first TSSLSocket::initializeHandshake() fails and leads to throwing an 
> TTransportException with "SSL_accept: wrong version number (SSL_error_code = 
> 1)"
> 3. Impala tries to close the transport
> 4. as the transport is TBufferedTransport, it calls flush on close() 
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TBufferTransports.h#L236
> 5. TBufferedTransport::flush() calls the underlying transport's flush() 
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TBufferTransports.cpp#L134
> 6. TSSlSocket::flush() calls initializeHandshake() 
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L586
> The hang only happens with the "wrong version number" error, other SSL_accept 
> errors were handled as before (e.g. "unsupported protocol").
> My expectation is that if the transport's open() was successful then calling 
> close() should be safe, even if a later function throws an exception. In case 
> of TSSLSocket open() just opens the socket and the first read()/write() will 
> do the actual SSL handshake. If the handshake fails then the TSSLSocket 
> enters a strange state where isOpen() returns true but the ssl_ object is not 
> usable.
> I think that a possible solution is to shutdown+free ssl_ if the handshake 
> failed:
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L698
> This means that isOpen() would return false after the error + 
> TSSLSocket::flush() would not try to reestablish connection. It would also 
> make sense to skip flushing in TBufferedTransport::close() if the underlying 
> transport is not open.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to