calavera opened a new issue #7255: URL: https://github.com/apache/trafficserver/issues/7255
There has been some investigation over the years about this, but it was never really resolved. @oknet put some asserts in the code to crash when the NetHandler has been acquired but not released, and we bumped into it this week. It's hard for me to give a working example to reproduce because it's very coupled with a lot of other code, but I'll explain what the code does, so hopefully someone has an idea to solve it. 1. We have a plugin with session hooks, TS_EVENT_HTTP_SSN_START and TS_EVENT_HTTP_SSN_CLOSE. 2. On SSN_START, the plugin sends a request to `https://localhost`, which is the TrafficServer service itself, which then proxies to a Google Compute Storage bucket. We do this because we want to cache the response from GCS in the same node that sends the request. 3. On SSN_CLOSE, we do two things, remove an object from the session, with `TSHttpSsnArgSet`, and then reenable the session with `TSHttpSsnReenable(session, TS_EVENT_HTTP_CONTINUE);`. The crash happens when the session releases resources after it's reenabled. As I mention in the title, this only happens with http/1 because http/2 doesn't have any of the same keepalive logic. It's very easy to reproduce with `curl --http1.1` when "you have the whole system working" :laughing: ``` [Oct 3 19:19:35.072] [ET_NET 11] DIAG: (netlify_resolve_async) Resolving bundle... host=traffic-mesh-test.netlify.com Fatal: UnixNetVConnection.cc:1449: failed assertion `!"BUG: It must have acquired the NetHandler's lock before doing anything on keep_alive_queue."` traffic_server: received signal 6 (Aborted) traffic_server - STACK TRACE: bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa8)[0x555d66264578] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f4e5f2783c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f4e5ed6418b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f4e5ed43859] /opt/ts/lib/libtscore.so.9(+0x60cc1)[0x7f4e5f77fcc1] /opt/ts/lib/libtscore.so.9(+0x5e01b)[0x7f4e5f77d01b] bin/traffic_server(_ZN18UnixNetVConnection23add_to_keep_alive_queueEv+0x125)[0x555d66506525] bin/traffic_server(_ZN18Http1ClientSession7releaseEP16ProxyTransaction+0x144)[0x555d662c91a4] bin/traffic_server(_ZN12ProxySession17handle_api_returnEi+0xbe)[0x555d664b0d9e] bin/traffic_server(_ZN12ProxySession17state_api_calloutEiPv+0x52)[0x555d664b0e12] bin/traffic_server(_ZN17TSHttpSsnCallback13event_handlerEiPv+0x6e)[0x555d66293c7e] bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x34d)[0x555d6654f21d] bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9Link_linkEEPiS5_+0x24e)[0x555d6654f94e] bin/traffic_server(_ZN7EThread15execute_regularEv+0x186)[0x555d6654fdf6] bin/traffic_server(_ZN7EThread7executeEv+0x1ee)[0x555d6655054e] bin/traffic_server(+0x3e88a9)[0x555d6654e8a9] /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f4e5f26c609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f4e5ee40293] ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org