Hi, I'm so happy you're able to replicate it! :)
With that patch that disabled pool_flush I still can reproduce on my r&d server and on production, just different places of crash: on r&d: (gdb) bt #0 tasklet_wakeup (tl=0xd720c300a0000000) at include/haproxy/task.h:328 #1 h2s_notify_recv (h2s=h2s@entry=0x55d720c2d500) at src/mux_h2.c:1037 #2 0x000055d71f44d3a0 in h2s_notify_recv (h2s=0x55d720c2d500) at include/haproxy/trace.h:150 #3 h2s_close (h2s=0x55d720c2d500) at src/mux_h2.c:1236 #4 0x000055d71f450c26 in h2s_frt_make_resp_headers (htx=0x55d720ae4c90, h2s=0x55d720c2d500) at src/mux_h2.c:4795 #5 h2_snd_buf (cs=0x55d720c31000, buf=0x55d720c2d888, count=182, flags=<optimized out>) at src/mux_h2.c:5888 #6 0x000055d71f4fb9fa in si_cs_send (cs=0x55d720c31000) at src/stream_interface.c:737 #7 0x000055d71f4fc2c0 in si_sync_send (si=si@entry=0x55d720c2db48) at src/stream_interface.c:914 #8 0x000055d71f49ea91 in process_stream (t=<optimized out>, context=0x55d720c2d810, state=<optimized out>) at src/stream.c:2245 #9 0x000055d71f55cfe9 in run_tasks_from_list (list=list@entry=0x55d71f96cb40 <task_per_thread+64>, max=max@entry=149) at src/task.c:371 #10 0x000055d71f55d7ca in process_runnable_tasks () at src/task.c:519 #11 0x000055d71f517c15 in run_poll_loop () at src/haproxy.c:2900 #12 0x000055d71f517fc9 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3065 #13 0x000055d71f3ef87e in main (argc=<optimized out>, argv=0x7fff7a4ef218) at src/haproxy.c:3767 on production: #0 0x0000557070df32d6 in h2s_notify_recv (h2s=0x7ff0fc696670) at src/mux_h2.c:1035 #1 h2s_close (h2s=0x7ff0fc696670) at src/mux_h2.c:1236 #2 0x0000557070df7922 in h2s_frt_make_resp_data (count=<optimized out>, buf=0x7ff0ec87ee78, h2s=0x7ff0fc696670) at src/mux_h2.c:5466 #3 h2_snd_buf (cs=0x7ff118af9790, buf=0x7ff0ec87ee78, count=3287, flags=<optimized out>) at src/mux_h2.c:5903 #4 0x0000557070ea19fa in si_cs_send (cs=cs@entry=0x7ff118af9790) at src/stream_interface.c:737 #5 0x0000557070ea1c5b in stream_int_chk_snd_conn (si=0x7ff0ec87f138) at src/stream_interface.c:1121 #6 0x0000557070e9f112 in si_chk_snd (si=0x7ff0ec87f138) at include/haproxy/stream_interface.h:488 #7 stream_int_notify (si=si@entry=0x7ff0ec87f190) at src/stream_interface.c:490 #8 0x0000557070ea1f48 in si_cs_process (cs=cs@entry=0x7ff0fc93d9d0) at src/stream_interface.c:624 #9 0x0000557070ea31fa in si_cs_io_cb (t=<optimized out>, ctx=0x7ff0ec87f190, state=<optimized out>) at src/stream_interface.c:792 #10 0x0000557070f030ed in run_tasks_from_list (list=list@entry=0x557071312c50 <task_per_thread+336>, max=<optimized out>) at src/task.c:348 #11 0x0000557070f037da in process_runnable_tasks () at src/task.c:523 #12 0x0000557070ebdc15 in run_poll_loop () at src/haproxy.c:2900 #13 0x0000557070ebdfc9 in run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:3065 #14 0x00007ff1cc2f16db in start_thread (arg=0x7ff11f7da700) at pthread_create.c:463 #15 0x00007ff1cb287a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 wt., 10 lis 2020 o 17:19 Willy Tarreau <[email protected]> napisaĆ(a): > On Tue, Nov 10, 2020 at 04:14:52PM +0100, Willy Tarreau wrote: > > Seems like we're getting closer. Will continue digging now. > > I found that among the 5 crashes I got, 3 were under pool_flush() > that is precisely called during the soft stopping. I tried to > disable that function with the patch below and I can't reproduce > the problem anymore, it would be nice if you could test it. I'm > suspecting that either it copes badly with the lockless pools, > or that pool_gc() itself, called from the signal handler, could > possibly damage some of the pools and cause some lose objects to > be used, returned and reused once reallocated. I see no reason > for the relation with SPOE like this, but maybe it just helps > trigger the complex condition. > > diff --git a/src/pool.c b/src/pool.c > index 321f8bc67..5e2f41fe9 100644 > --- a/src/pool.c > +++ b/src/pool.c > @@ -246,7 +246,7 @@ void pool_flush(struct pool_head *pool) > void **next, *temp; > int removed = 0; > > - if (!pool) > + //if (!pool) > return; > HA_SPIN_LOCK(POOL_LOCK, &pool->lock); > do { > > I'm continuing to investigate. > > Willy >

