Hi!

It seems to me there is something wrong with this patch: for some reason
process stops responding with 100% CPU used by all threads.
Backtrace:
(gdb) thread apply all bt

Thread 4 (Thread 0x7fdf68c9c700 (LWP 615744)):
#0  0x0000564fc9a61990 in fwrr_update_server_weight (srv=0x564fcb5014b0) at
src/lb_fwrr.c:198
#1  0x0000564fc99b5363 in srv_update_status (s=0x564fcb5014b0) at
src/server.c:4923
#2  0x0000564fc99b46e2 in server_recalc_eweight (sv=sv@entry=0x564fcb5014b0,
must_update=must_update@entry=1) at src/server.c:1310
#3  0x0000564fc99b6ca2 in server_parse_weight_change_request
(sv=sv@entry=0x564fcb5014b0,
weight_str=weight_str@entry=0x564fcb50a1d0 "68%") at src/server.c:1356
#4  0x0000564fc99c1f3c in __event_srv_chk_r (cs=cs@entry=0x7fdf62885e20) at
src/checks.c:1114
#5  0x0000564fc99c5000 in event_srv_chk_io (t=<optimized out>,
ctx=0x564fcb501b70, state=<optimized out>) at src/checks.c:730
#6  0x0000564fc9a56bb2 in process_runnable_tasks () at src/task.c:390
#7  0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652
#8  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717
#9  0x00007fdf6c7326ba in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007fdf6b70241d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 3 (Thread 0x7fdf6949d700 (LWP 615743)):
#0  0x0000564fc9a61e7a in fwrr_get_next_server (p=0x564fcabd8e60,
srvtoavoid=srvtoavoid@entry=0x0) at src/lb_fwrr.c:528
#1  0x0000564fc9a11fa8 in assign_server (s=s@entry=0x7fdf54860b80) at
src/backend.c:673
#2  0x0000564fc9a12b07 in assign_server_and_queue (s=s@entry=0x7fdf54860b80)
at src/backend.c:963
#3  0x0000564fc9a15e07 in assign_server_and_queue (s=0x7fdf54860b80) at
include/proto/freq_ctr.h:55
#4  srv_redispatch_connect (s=s@entry=0x7fdf54860b80) at src/backend.c:1621
#5  0x0000564fc9988836 in sess_prepare_conn_req (s=0x7fdf54860b80) at
src/stream.c:1163
#6  process_stream (t=<optimized out>, context=0x7fdf54860b80,
state=<optimized out>) at src/stream.c:2310
#7  0x0000564fc9a56807 in process_runnable_tasks () at src/task.c:387
#8  0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652
#9  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717
#10 0x00007fdf6c7326ba in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007fdf6b70241d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2 (Thread 0x7fdf69c9e700 (LWP 615742)):
#0  0x0000564fc9a61e7a in fwrr_get_next_server (p=0x564fcabd8e60,
srvtoavoid=srvtoavoid@entry=0x0) at src/lb_fwrr.c:528
#1  0x0000564fc9a11fa8 in assign_server (s=s@entry=0x7fdf667a3690) at
src/backend.c:673
#2  0x0000564fc9a12b07 in assign_server_and_queue (s=s@entry=0x7fdf667a3690)
at src/backend.c:963
#3  0x0000564fc9a15e07 in assign_server_and_queue (s=0x7fdf667a3690) at
include/proto/freq_ctr.h:55
#4  srv_redispatch_connect (s=s@entry=0x7fdf667a3690) at src/backend.c:1621
#5  0x0000564fc9988836 in sess_prepare_conn_req (s=0x7fdf667a3690) at
src/stream.c:1163
#6  process_stream (t=<optimized out>, context=0x7fdf667a3690,
state=<optimized out>) at src/stream.c:2310
#7  0x0000564fc9a56807 in process_runnable_tasks () at src/task.c:387
#8  0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652
#9  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717
#10 0x00007fdf6c7326ba in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#11 0x00007fdf6b70241d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7fdf6cf27180 (LWP 615741)):
#0  fwrr_get_server_from_group (grp=0x564fcabd9b88) at src/lb_fwrr.c:464
#1  fwrr_get_next_server (p=0x564fcabd8e60, srvtoavoid=srvtoavoid@entry=0x0)
at src/lb_fwrr.c:556
#2  0x0000564fc9a11fa8 in assign_server (s=s@entry=0x564fd48c4f90) at
src/backend.c:673
#3  0x0000564fc9a12b07 in assign_server_and_queue (s=s@entry=0x564fd48c4f90)
at src/backend.c:963
#4  0x0000564fc9a15e07 in assign_server_and_queue (s=0x564fd48c4f90) at
include/proto/freq_ctr.h:55
#5  srv_redispatch_connect (s=s@entry=0x564fd48c4f90) at src/backend.c:1621
#6  0x0000564fc9988836 in sess_prepare_conn_req (s=0x564fd48c4f90) at
src/stream.c:1163
#7  process_stream (t=<optimized out>, context=0x564fd48c4f90,
state=<optimized out>) at src/stream.c:2310
#8  0x0000564fc9a56807 in process_runnable_tasks () at src/task.c:387
#9  0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652
#10 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717
#11 0x0000564fc992779c in main (argc=<optimized out>, argv=<optimized out>)
at src/haproxy.c:3379

ср, 17 апр. 2019 г. в 05:11, Willy Tarreau <w...@1wt.eu>:

> Hi Maksim,
>
> On Tue, Apr 16, 2019 at 07:28:28AM +0200, Willy Tarreau wrote:
> > > So I agree upon another thread activity. The unique thing about
> > > these servers - all of them use haproxy-agent to set up weights of
> their
> > > backends. Other instances with no haproxy-agent in their configs don't
> > > produce cores.
> >
> > Great, this will definitely help me validate my hypothesis. I'm not sure
> > the fix will be easy but I'm back to this.
>
> OK so I could finally figure what the problem was and fix it. The upper
> level function used to expect to be called with the server's lock held
> while it is responsible for choosing the server... As you can expect,
> it didn't have good chances to resist to concurrency.
>
> I've merged the fix into 2.0-dev and backported it into 1.9-maint. Feel
> free to update to latest 1.9 git or snapshot.
>
> Thank you very much for your report, it was extremely helpful!
> Willy
>

Reply via email to