Thought I fixed this with 507368d035bce23d35832fb3c9dbae9158ba452d, but after 
wider deployment of that fix, we got another crash on a freed client_vc.  
Another look at the ProxyClientSession::handle_api_return, shows that I missed 
the very obvious call to vc->do_io_close() before ProxyClientSession::free() 
which will trigger the exactly same freed client vc with stale reference.  So 
the original fix is good but not sufficient.

Here is our most recent stack trace
```
#0  0x0000000000000000 in ?? ()
#1  0x00000000006c5c43 in Http2ClientSession::free (this=0x2af4403bdc60) at 
../../../../trafficserver/proxy/http2/Http2ClientSession.cc:94
#2  0x00000000005bd5c9 in ProxyClientSession::handle_api_return 
(this=0x2af4403bdc60, event=0) at 
../../../trafficserver/proxy/ProxyClientSession.cc:212
#3  0x00000000005bd3bc in ProxyClientSession::state_api_callout 
(this=0x2af4403bdc60, event=0, data=0x0) at 
../../../trafficserver/proxy/ProxyClientSession.cc:156
#4  0x00000000005bd4a6 in ProxyClientSession::do_api_callout 
(this=0x2af4403bdc60, id=TS_HTTP_SSN_CLOSE_HOOK) at 
../../../trafficserver/proxy/ProxyClientSession.cc:182
#5  0x00000000006c5bc8 in Http2ClientSession::destroy (this=0x2af4403bdc60) at 
../../../../trafficserver/proxy/http2/Http2ClientSession.cc:82
#6  0x00000000006ce838 in Http2ConnectionState::release_stream 
(this=0x2af4403bdf00, stream=0x2af30c262660) at 
../../../../trafficserver/proxy/http2/Http2ConnectionState.cc:1246
#7  0x00000000006bf399 in Http2Stream::destroy (this=0x2af30c262660) at 
../../../../trafficserver/proxy/http2/Http2Stream.cc:716
#8  0x00000000006bd6fd in Http2Stream::terminate_if_possible 
(this=0x2af30c262660) at 
../../../../trafficserver/proxy/http2/Http2Stream.cc:377
#9  0x00000000006bd63b in Http2Stream::transaction_done (this=0x2af30c262660) 
at ../../../../trafficserver/proxy/http2/Http2Stream.cc:366
#10 0x000000000066a16b in HttpSM::kill_this (this=0x2ae8e8113900) at 
../../../../trafficserver/proxy/http/HttpSM.cc:7078
#11 0x000000000065b2af in HttpSM::main_handler (this=0x2ae8e8113900, 
event=2301, data=0x2ae8e8114c80) at 
../../../../trafficserver/proxy/http/HttpSM.cc:2826
#12 0x0000000000831494 in Continuation::dispatchEvent (this=0x2ae8e8113900, 
event=2301, data=0x2ae8e8114c80) at 
../../../../trafficserver/iocore/eventsystem/Continuation.cc:46
#13 0x00000000006b5f32 in HttpTunnel::main_handler (this=0x2ae8e8114c80, 
event=104, data=0x2aea980a5bb0) at 
../../../../trafficserver/proxy/http/HttpTunnel.cc:1650
#14 0x00000000008313ad in Continuation::handleEvent (this=0x2ae8e8114c80, 
event=104, data=0x2aea980a5bb0) at 
../../../../trafficserver/iocore/eventsystem/Continuation.cc:33
#15 0x000000000080e35a in read_signal_and_update (event=104, vc=0x2aea980a5a60) 
at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:144
#16 0x000000000080e6d5 in read_signal_done (event=104, nh=0x2ae87541dcf0, 
vc=0x2aea980a5a60) at 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:205
#17 0x00000000008113d2 in UnixNetVConnection::readSignalDone 
(this=0x2aea980a5a60, event=104, nh=0x2ae87541dcf0) at 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:1096
#18 0x00000000007e91e4 in SSLNetVConnection::net_read_io (this=0x2aea980a5a60, 
nh=0x2ae87541dcf0, lthread=0x2ae87541a010) at 
../../../../trafficserver/iocore/net/SSLNetVConnection.cc:649
#19 0x0000000000806443 in NetHandler::waitForActivity (this=0x2ae87541dcf0, 
timeout=60000000) at ../../../../trafficserver/iocore/net/UnixNet.cc:497
#20 0x0000000000833b3a in EThread::execute_regular (this=0x2ae87541a010) at 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:273
#21 0x0000000000833d01 in EThread::execute (this=0x2ae87541a010) at 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:326
#22 0x0000000000832936 in spawn_thread_internal (a=0x2092df0) at 
../../../../trafficserver/iocore/eventsystem/Thread.cc:85
#23 0x00002ae87314fdc5 in start_thread () from /lib64/libpthread.so.0
#24 0x00002ae873e7e76d in clone () from /lib64/libc.so.6
```

Looking at frame 1, client_vc is non-null, but it has been freed (its vtable 
pointer is bogus).  Looking back at ProxyClientSession::handle_api_return, were 
we removed the call to net_vc, we see there is still a problem.

```
     NetVConnection *vc = this->get_netvc();
     if (vc) {
       vc->do_io_close();
     }
     free(); // You can now clean things up
     break;
```

We free the vc the call to do_io_close because we came in on the other vc, so 
this vc does not have its recursion count bumped.  The Http2ClientSession still 
has a stale reference to the vc via its client_vc member, so then bad things 
can happen if that vc has already been freed. 

Http2CientSession and Http1ClientSession both call do_io_close and clear the 
client VC in their ::do_io_close methods.  But for their ::free methods only 
Http2ClientSession calls do_io_close on the client vc.  Http1ClientSession does 
not.

Either both should or neither should.  We cannot safely do both in 
handle_api_return.  After calling ::free the session is gone, so we cannot 
reach in and clean up the VC afterwards.  If we continue with the early 
vc->do_io_close, we would have to make another call to set the VC to null 
before calling into ::free.

At this point, it would be cleanest just to defer the client vc clean up to the 
Http*ClientSession::free logic, which is what I propose in this PR.  

[ Full content available at: https://github.com/apache/trafficserver/pull/4178 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to