calavera opened a new issue #7255:
URL: https://github.com/apache/trafficserver/issues/7255


   There has been some investigation over the years about this, but it was 
never really resolved. @oknet put some asserts in the code to crash when the 
NetHandler has been acquired but not released, and we bumped into it this week.
   
   It's hard for me to give a working example to reproduce because it's very 
coupled with a lot of other code, but I'll explain what the code does, so 
hopefully someone has an idea to solve it.
   
   1. We have a plugin with session hooks, TS_EVENT_HTTP_SSN_START and 
TS_EVENT_HTTP_SSN_CLOSE.
   2. On SSN_START, the plugin sends a request to `https://localhost`, which is 
the TrafficServer service itself, which then proxies to a Google Compute 
Storage bucket. We do this because we want to cache the response from GCS in 
the same node that sends the request.
   3. On SSN_CLOSE, we do two things, remove an object from the session, with 
`TSHttpSsnArgSet`, and then reenable the session with 
`TSHttpSsnReenable(session, TS_EVENT_HTTP_CONTINUE);`. The crash happens when 
the session releases resources after it's reenabled.
   
   As I mention in the title, this only happens with http/1 because http/2 
doesn't have any of the same keepalive logic. It's very easy to reproduce with 
`curl --http1.1` when "you have the whole system working" :laughing: 
   
   ```
   [Oct  3 19:19:35.072] [ET_NET 11] DIAG: (netlify_resolve_async) Resolving 
bundle...                          host=traffic-mesh-test.netlify.com
   Fatal: UnixNetVConnection.cc:1449: failed assertion `!"BUG: It must have 
acquired the NetHandler's lock before doing anything on keep_alive_queue."`
   traffic_server: received signal 6 (Aborted)
   traffic_server - STACK TRACE: 
   
bin/traffic_server(_Z19crash_logger_invokeiP9siginfo_tPv+0xa8)[0x555d66264578]
   /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f4e5f2783c0]
   /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f4e5ed6418b]
   /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f4e5ed43859]
   /opt/ts/lib/libtscore.so.9(+0x60cc1)[0x7f4e5f77fcc1]
   /opt/ts/lib/libtscore.so.9(+0x5e01b)[0x7f4e5f77d01b]
   
bin/traffic_server(_ZN18UnixNetVConnection23add_to_keep_alive_queueEv+0x125)[0x555d66506525]
   
bin/traffic_server(_ZN18Http1ClientSession7releaseEP16ProxyTransaction+0x144)[0x555d662c91a4]
   
bin/traffic_server(_ZN12ProxySession17handle_api_returnEi+0xbe)[0x555d664b0d9e]
   
bin/traffic_server(_ZN12ProxySession17state_api_calloutEiPv+0x52)[0x555d664b0e12]
   
bin/traffic_server(_ZN17TSHttpSsnCallback13event_handlerEiPv+0x6e)[0x555d66293c7e]
   bin/traffic_server(_ZN7EThread13process_eventEP5Eventi+0x34d)[0x555d6654f21d]
   
bin/traffic_server(_ZN7EThread13process_queueEP5QueueI5EventNS1_9Link_linkEEPiS5_+0x24e)[0x555d6654f94e]
   bin/traffic_server(_ZN7EThread15execute_regularEv+0x186)[0x555d6654fdf6]
   bin/traffic_server(_ZN7EThread7executeEv+0x1ee)[0x555d6655054e]
   bin/traffic_server(+0x3e88a9)[0x555d6654e8a9]
   /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f4e5f26c609]
   /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f4e5ee40293]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to