We've seen this crash every few days for a while.  Finally think I figured out 
the issue. 

```
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000006c86db in Http2ClientSession::release_netvc 
(this=0x2ab223d4c380) at 
../../../../trafficserver/proxy/http2/Http2ClientSession.h:193
#2  0x00000000005bd53b in ProxyClientSession::handle_api_return 
(this=0x2ab223d4c380, event=0) at 
../../../trafficserver/proxy/ProxyClientSession.cc:211
#3  0x00000000005bd32e in ProxyClientSession::state_api_callout 
(this=0x2ab223d4c380, event=0, data=0x0) at 
../../../trafficserver/proxy/ProxyClientSession.cc:156
#4  0x00000000005bd418 in ProxyClientSession::do_api_callout 
(this=0x2ab223d4c380, id=TS_HTTP_SSN_CLOSE_HOOK) at 
../../../trafficserver/proxy/ProxyClientSession.cc:182
#5  0x00000000006c572a in Http2ClientSession::destroy (this=0x2ab223d4c380) at 
../../../../trafficserver/proxy/http2/Http2ClientSession.cc:82
#6  0x00000000006ce480 in Http2ConnectionState::release_stream 
(this=0x2ab223d4c620, stream=0x2abe0aae68c0) at 
../../../../trafficserver/proxy/http2/Http2ConnectionState.cc:1246
#7  0x00000000006beefb in Http2Stream::destroy (this=0x2abe0aae68c0) at 
../../../../trafficserver/proxy/http2/Http2Stream.cc:716
#8  0x00000000006bd25f in Http2Stream::terminate_if_possible 
(this=0x2abe0aae68c0) at 
../../../../trafficserver/proxy/http2/Http2Stream.cc:377
#9  0x00000000006bd19d in Http2Stream::transaction_done (this=0x2abe0aae68c0) 
at ../../../../trafficserver/proxy/http2/Http2Stream.cc:366
#10 0x0000000000669d12 in HttpSM::kill_this (this=0x2abedc914c00) at 
../../../../trafficserver/proxy/http/HttpSM.cc:7060
#11 0x000000000065aeeb in HttpSM::main_handler (this=0x2abedc914c00, 
event=2301, data=0x2abedc915f80) at 
../../../../trafficserver/proxy/http/HttpSM.cc:2815
#12 0x000000000083104e in Continuation::dispatchEvent (this=0x2abedc914c00, 
event=2301, data=0x2abedc915f80) at 
../../../../trafficserver/iocore/eventsystem/Continuation.cc:46
#13 0x00000000006b5a94 in HttpTunnel::main_handler (this=0x2abedc915f80, 
event=104, data=0x2ab4ec0866b0) at 
../../../../trafficserver/proxy/http/HttpTunnel.cc:1650
#14 0x0000000000830f67 in Continuation::handleEvent (this=0x2abedc915f80, 
event=104, data=0x2ab4ec0866b0) at 
../../../../trafficserver/iocore/eventsystem/Continuation.cc:33
#15 0x000000000080df14 in read_signal_and_update (event=104, vc=0x2ab4ec086560) 
at ../../../../trafficserver/iocore/net/UnixNetVConnection.cc:144
#16 0x000000000080e28f in read_signal_done (event=104, nh=0x2ab19b314cf0, 
vc=0x2ab4ec086560) at 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:205
#17 0x000000000080e989 in read_from_net (nh=0x2ab19b314cf0, vc=0x2ab4ec086560, 
thread=0x2ab19b311010) at 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:355
#18 0x00000000008109b5 in UnixNetVConnection::net_read_io (this=0x2ab4ec086560, 
nh=0x2ab19b314cf0, lthread=0x2ab19b311010) at 
../../../../trafficserver/iocore/net/UnixNetVConnection.cc:971
#19 0x0000000000805ffd in NetHandler::waitForActivity (this=0x2ab19b314cf0, 
timeout=60000000) at ../../../../trafficserver/iocore/net/UnixNet.cc:497
#20 0x00000000008336f4 in EThread::execute_regular (this=0x2ab19b311010) at 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:273
#21 0x00000000008338bb in EThread::execute (this=0x2ab19b311010) at 
../../../../trafficserver/iocore/eventsystem/UnixEThread.cc:326
#22 0x00000000008324f0 in spawn_thread_internal (a=0x1f54f30) at 
../../../../trafficserver/iocore/eventsystem/Thread.cc:85
#23 0x00002ab198c12dc5 in start_thread () from /lib64/libpthread.so.0
#24 0x00002ab19994176d in clone () from /lib64/libc.so.6
```

The stack is confusing when you look at the code be cause it appears that we 
are calling through a null client_vc, but the line above we check that 
client_vc is not null.  I think the real problem is that client_vc has been 
closed and frees in the ProxyClientSession::handle_api_return() call down the 
stack.

If the original event came in on client_vc, this would have been ok because the 
vc's recursion count would be > 0 and the actual free wouldn't occur until we 
unwound the stack.  But in this case the EOS event came in on the server VC, 
and this client VC is accessed form the HttpSM shutdown.

Just reordering the calls to vc->do_io_close and this->release_netvc in 
ProxyClientSession::handle_api_return should fix the issue.  But looking at the 
bigger picture, I don't think that release_netvc is needed at all at this 
point.  All it does is make sure that there is no lingering client session data 
on the netvc.  But it is always called right next to a call to close the netvc, 
so at this point it adds no value.

I'm putting up this PR, but will be running this change internally for a bit to 
verify that it does address the issue.

[ Full content available at: https://github.com/apache/trafficserver/pull/4147 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to