Thought I fixed this with 507368d035bce23d35832fb3c9dbae9158ba452d, but after
wider deployment of that fix, we got another crash on a freed client_vc.
Another look at the ProxyClientSession::handle_api_return, shows that I missed
the very obvious call to vc->do_io_close() before ProxyClientSession::free()
which will trigger the exactly same freed client vc with stale reference. So
the original fix is good but not sufficient.
Here is our most recent stack trace
```
#0 0x0000000000000000 in ?? ()
#1 0x00000000006c5c43 in Http2ClientSession::free (this=0x2af4403bdc60) at
../../../../trafficserver/proxy/http2/Http2ClientSession.cc:94
#2 0x00000000005bd5c9 in ProxyClientSession::handle_api_return
(this=0x2af4403bdc60, event=0) at
../../../trafficserver/proxy/ProxyClientSession.cc:212
#3 0x00000000005bd3bc in ProxyClientSession::state_api_callout
(this=0x2af4403bdc60, event=0, data=0x0) at
../../../trafficserver/proxy/ProxyClientSession.cc:156
#4 0x00000000005bd4a6 in ProxyClientSession::do_api_callout
(this=0x2af4403bdc60, id=TS_HTTP_SSN_CLOSE_HOOK) at
../../../trafficserver/proxy/ProxyClientSession.cc:182
#5 0x00000000006c5bc8 in Http2ClientSession::destroy (this=0x2af4403bdc60) at
../../../../trafficserver/proxy/http2/Http2ClientSession.cc:82
#6 0x00000000006ce838 in Http2ConnectionState::release_stream
(this=0x2af4403bdf00, stream=0x2af30c262660) at
../../../../trafficserver/proxy/http2/Http2ConnectionState.cc:1246
#7 0x00000000006bf399 in Http2Stream::destroy (this=0x2af30c262660) at
../../../../trafficserver/proxy/http2/Http2Stream.cc:716
#8 0x00000000006bd6fd in Http2Stream::terminate_if_possible
(this=0x2af30c262660) at
../../../../trafficserver/proxy/http2/Http2Stream.cc:377
#9 0x00000000006bd63b in Http2Stream::transaction_done (this=0x2af30c262660)
at ../../../../trafficserver/proxy/http2/Http2Stream.cc:366
#10 0x000000000066a16b in HttpSM::kill_this (this=0x2ae8e8113900) at
../../../../trafficserver/proxy/http/HttpSM.cc:7078
#11 0x000000000065b2af in HttpSM::main_handler (this=0x2ae8e8113900,
event=2301, data=0x2ae8e8114c80) at
../../../../trafficserver/proxy/http/HttpSM.cc:2826
#12 0x0000000000831494 in Continuation::dispatchEvent (this=0x2ae8e8113900,
event=2301, data=0x2ae8e8114c80) at
../../../../trafficserver/iocore/eventsystem/Continuation.cc:46
#13 0x00000000006b5f32 in HttpTunnel::main_handler (this=0x2ae8e8114c80,
event=104, data=0x2aea980a5bb0) at
../../../../trafficserver/proxy/http/HttpTunnel.cc:1650
#14 0x00000000008313ad in Continuation::handleEvent (this=0x2ae8e8114c80,
event=104, data=0x2aea980a5bb0) at
../../../../trafficserver/iocore/eventsystem/Continuation.cc:33
#15 0x000000000080e35a in read_signal_and_update (event=104, vc=0x2aea980a5a60)
at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:144
#16 0x000000000080e6d5 in read_signal_done (event=104, nh=0x2ae87541dcf0,
vc=0x2aea980a5a60) at
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:205
#17 0x00000000008113d2 in UnixNetVConnection::readSignalDone
(this=0x2aea980a5a60, event=104, nh=0x2ae87541dcf0) at
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:1096
#18 0x00000000007e91e4 in SSLNetVConnection::net_read_io (this=0x2aea980a5a60,
nh=0x2ae87541dcf0, lthread=0x2ae87541a010) at
../../../../trafficserver/iocore/net/SSLNetVConnection.cc:649
#19 0x0000000000806443 in NetHandler::waitForActivity (this=0x2ae87541dcf0,
timeout=60000000) at ../../../../trafficserver/iocore/net/UnixNet.cc:497
#20 0x0000000000833b3a in EThread::execute_regular (this=0x2ae87541a010) at
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:273
#21 0x0000000000833d01 in EThread::execute (this=0x2ae87541a010) at
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:326
#22 0x0000000000832936 in spawn_thread_internal (a=0x2092df0) at
../../../../trafficserver/iocore/eventsystem/Thread.cc:85
#23 0x00002ae87314fdc5 in start_thread () from /lib64/libpthread.so.0
#24 0x00002ae873e7e76d in clone () from /lib64/libc.so.6
```
Looking at frame 1, client_vc is non-null, but it has been freed (its vtable
pointer is bogus). Looking back at ProxyClientSession::handle_api_return, were
we removed the call to net_vc, we see there is still a problem.
```
NetVConnection *vc = this->get_netvc();
if (vc) {
vc->do_io_close();
}
free(); // You can now clean things up
break;
```
We free the vc the call to do_io_close because we came in on the other vc, so
this vc does not have its recursion count bumped. The Http2ClientSession still
has a stale reference to the vc via its client_vc member, so then bad things
can happen if that vc has already been freed.
Http2CientSession and Http1ClientSession both call do_io_close and clear the
client VC in their ::do_io_close methods. But for their ::free methods only
Http2ClientSession calls do_io_close on the client vc. Http1ClientSession does
not.
Either both should or neither should. We cannot safely do both in
handle_api_return. After calling ::free the session is gone, so we cannot
reach in and clean up the VC afterwards. If we continue with the early
vc->do_io_close, we would have to make another call to set the VC to null
before calling into ::free.
At this point, it would be cleanest just to defer the client vc clean up to the
Http*ClientSession::free logic, which is what I propose in this PR.
[ Full content available at: https://github.com/apache/trafficserver/pull/4178 ]
This message was relayed via gitbox.apache.org for [email protected]