Hi,

I'm so happy you're able to replicate it! :)

With that patch that disabled pool_flush I still can reproduce on my r&d
server and on production, just different places of crash:

on r&d:
(gdb) bt
#0  tasklet_wakeup (tl=0xd720c300a0000000) at include/haproxy/task.h:328
#1  h2s_notify_recv (h2s=h2s@entry=0x55d720c2d500) at src/mux_h2.c:1037
#2  0x000055d71f44d3a0 in h2s_notify_recv (h2s=0x55d720c2d500) at
include/haproxy/trace.h:150
#3  h2s_close (h2s=0x55d720c2d500) at src/mux_h2.c:1236
#4  0x000055d71f450c26 in h2s_frt_make_resp_headers (htx=0x55d720ae4c90,
h2s=0x55d720c2d500) at src/mux_h2.c:4795
#5  h2_snd_buf (cs=0x55d720c31000, buf=0x55d720c2d888, count=182,
flags=<optimized out>) at src/mux_h2.c:5888
#6  0x000055d71f4fb9fa in si_cs_send (cs=0x55d720c31000) at
src/stream_interface.c:737
#7  0x000055d71f4fc2c0 in si_sync_send (si=si@entry=0x55d720c2db48) at
src/stream_interface.c:914
#8  0x000055d71f49ea91 in process_stream (t=<optimized out>,
context=0x55d720c2d810, state=<optimized out>) at src/stream.c:2245
#9  0x000055d71f55cfe9 in run_tasks_from_list (list=list@entry=0x55d71f96cb40
<task_per_thread+64>, max=max@entry=149) at src/task.c:371
#10 0x000055d71f55d7ca in process_runnable_tasks () at src/task.c:519
#11 0x000055d71f517c15 in run_poll_loop () at src/haproxy.c:2900
#12 0x000055d71f517fc9 in run_thread_poll_loop (data=<optimized out>) at
src/haproxy.c:3065
#13 0x000055d71f3ef87e in main (argc=<optimized out>, argv=0x7fff7a4ef218)
at src/haproxy.c:3767

on production:
#0  0x0000557070df32d6 in h2s_notify_recv (h2s=0x7ff0fc696670) at
src/mux_h2.c:1035
#1  h2s_close (h2s=0x7ff0fc696670) at src/mux_h2.c:1236
#2  0x0000557070df7922 in h2s_frt_make_resp_data (count=<optimized out>,
buf=0x7ff0ec87ee78, h2s=0x7ff0fc696670) at src/mux_h2.c:5466
#3  h2_snd_buf (cs=0x7ff118af9790, buf=0x7ff0ec87ee78, count=3287,
flags=<optimized out>) at src/mux_h2.c:5903
#4  0x0000557070ea19fa in si_cs_send (cs=cs@entry=0x7ff118af9790) at
src/stream_interface.c:737
#5  0x0000557070ea1c5b in stream_int_chk_snd_conn (si=0x7ff0ec87f138) at
src/stream_interface.c:1121
#6  0x0000557070e9f112 in si_chk_snd (si=0x7ff0ec87f138) at
include/haproxy/stream_interface.h:488
#7  stream_int_notify (si=si@entry=0x7ff0ec87f190) at
src/stream_interface.c:490
#8  0x0000557070ea1f48 in si_cs_process (cs=cs@entry=0x7ff0fc93d9d0) at
src/stream_interface.c:624
#9  0x0000557070ea31fa in si_cs_io_cb (t=<optimized out>,
ctx=0x7ff0ec87f190, state=<optimized out>) at src/stream_interface.c:792
#10 0x0000557070f030ed in run_tasks_from_list (list=list@entry=0x557071312c50
<task_per_thread+336>, max=<optimized out>) at src/task.c:348
#11 0x0000557070f037da in process_runnable_tasks () at src/task.c:523
#12 0x0000557070ebdc15 in run_poll_loop () at src/haproxy.c:2900
#13 0x0000557070ebdfc9 in run_thread_poll_loop (data=<optimized out>) at
src/haproxy.c:3065
#14 0x00007ff1cc2f16db in start_thread (arg=0x7ff11f7da700) at
pthread_create.c:463
#15 0x00007ff1cb287a3f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

wt., 10 lis 2020 o 17:19 Willy Tarreau <[email protected]> napisaƂ(a):

> On Tue, Nov 10, 2020 at 04:14:52PM +0100, Willy Tarreau wrote:
> > Seems like we're getting closer. Will continue digging now.
>
> I found that among the 5 crashes I got, 3 were under pool_flush()
> that is precisely called during the soft stopping. I tried to
> disable that function with the patch below and I can't reproduce
> the problem anymore, it would be nice if you could test it. I'm
> suspecting that either it copes badly with the lockless pools,
> or that pool_gc() itself, called from the signal handler, could
> possibly damage some of the pools and cause some lose objects to
> be used, returned and reused once reallocated. I see no reason
> for the relation with SPOE like this, but maybe it just helps
> trigger the complex condition.
>
> diff --git a/src/pool.c b/src/pool.c
> index 321f8bc67..5e2f41fe9 100644
> --- a/src/pool.c
> +++ b/src/pool.c
> @@ -246,7 +246,7 @@ void pool_flush(struct pool_head *pool)
>         void **next, *temp;
>         int removed = 0;
>
> -       if (!pool)
> +       //if (!pool)
>                 return;
>         HA_SPIN_LOCK(POOL_LOCK, &pool->lock);
>         do {
>
> I'm continuing to investigate.
>
> Willy
>

Reply via email to