Hi! It seems to me there is something wrong with this patch: for some reason process stops responding with 100% CPU used by all threads. Backtrace: (gdb) thread apply all bt
Thread 4 (Thread 0x7fdf68c9c700 (LWP 615744)): #0 0x0000564fc9a61990 in fwrr_update_server_weight (srv=0x564fcb5014b0) at src/lb_fwrr.c:198 #1 0x0000564fc99b5363 in srv_update_status (s=0x564fcb5014b0) at src/server.c:4923 #2 0x0000564fc99b46e2 in server_recalc_eweight (sv=sv@entry=0x564fcb5014b0, must_update=must_update@entry=1) at src/server.c:1310 #3 0x0000564fc99b6ca2 in server_parse_weight_change_request (sv=sv@entry=0x564fcb5014b0, weight_str=weight_str@entry=0x564fcb50a1d0 "68%") at src/server.c:1356 #4 0x0000564fc99c1f3c in __event_srv_chk_r (cs=cs@entry=0x7fdf62885e20) at src/checks.c:1114 #5 0x0000564fc99c5000 in event_srv_chk_io (t=<optimized out>, ctx=0x564fcb501b70, state=<optimized out>) at src/checks.c:730 #6 0x0000564fc9a56bb2 in process_runnable_tasks () at src/task.c:390 #7 0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652 #8 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717 #9 0x00007fdf6c7326ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #10 0x00007fdf6b70241d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Thread 3 (Thread 0x7fdf6949d700 (LWP 615743)): #0 0x0000564fc9a61e7a in fwrr_get_next_server (p=0x564fcabd8e60, srvtoavoid=srvtoavoid@entry=0x0) at src/lb_fwrr.c:528 #1 0x0000564fc9a11fa8 in assign_server (s=s@entry=0x7fdf54860b80) at src/backend.c:673 #2 0x0000564fc9a12b07 in assign_server_and_queue (s=s@entry=0x7fdf54860b80) at src/backend.c:963 #3 0x0000564fc9a15e07 in assign_server_and_queue (s=0x7fdf54860b80) at include/proto/freq_ctr.h:55 #4 srv_redispatch_connect (s=s@entry=0x7fdf54860b80) at src/backend.c:1621 #5 0x0000564fc9988836 in sess_prepare_conn_req (s=0x7fdf54860b80) at src/stream.c:1163 #6 process_stream (t=<optimized out>, context=0x7fdf54860b80, state=<optimized out>) at src/stream.c:2310 #7 0x0000564fc9a56807 in process_runnable_tasks () at src/task.c:387 #8 0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652 #9 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717 #10 0x00007fdf6c7326ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #11 0x00007fdf6b70241d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Thread 2 (Thread 0x7fdf69c9e700 (LWP 615742)): #0 0x0000564fc9a61e7a in fwrr_get_next_server (p=0x564fcabd8e60, srvtoavoid=srvtoavoid@entry=0x0) at src/lb_fwrr.c:528 #1 0x0000564fc9a11fa8 in assign_server (s=s@entry=0x7fdf667a3690) at src/backend.c:673 #2 0x0000564fc9a12b07 in assign_server_and_queue (s=s@entry=0x7fdf667a3690) at src/backend.c:963 #3 0x0000564fc9a15e07 in assign_server_and_queue (s=0x7fdf667a3690) at include/proto/freq_ctr.h:55 #4 srv_redispatch_connect (s=s@entry=0x7fdf667a3690) at src/backend.c:1621 #5 0x0000564fc9988836 in sess_prepare_conn_req (s=0x7fdf667a3690) at src/stream.c:1163 #6 process_stream (t=<optimized out>, context=0x7fdf667a3690, state=<optimized out>) at src/stream.c:2310 #7 0x0000564fc9a56807 in process_runnable_tasks () at src/task.c:387 #8 0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652 #9 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717 #10 0x00007fdf6c7326ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- #11 0x00007fdf6b70241d in clone () from /lib/x86_64-linux-gnu/libc.so.6 Thread 1 (Thread 0x7fdf6cf27180 (LWP 615741)): #0 fwrr_get_server_from_group (grp=0x564fcabd9b88) at src/lb_fwrr.c:464 #1 fwrr_get_next_server (p=0x564fcabd8e60, srvtoavoid=srvtoavoid@entry=0x0) at src/lb_fwrr.c:556 #2 0x0000564fc9a11fa8 in assign_server (s=s@entry=0x564fd48c4f90) at src/backend.c:673 #3 0x0000564fc9a12b07 in assign_server_and_queue (s=s@entry=0x564fd48c4f90) at src/backend.c:963 #4 0x0000564fc9a15e07 in assign_server_and_queue (s=0x564fd48c4f90) at include/proto/freq_ctr.h:55 #5 srv_redispatch_connect (s=s@entry=0x564fd48c4f90) at src/backend.c:1621 #6 0x0000564fc9988836 in sess_prepare_conn_req (s=0x564fd48c4f90) at src/stream.c:1163 #7 process_stream (t=<optimized out>, context=0x564fd48c4f90, state=<optimized out>) at src/stream.c:2310 #8 0x0000564fc9a56807 in process_runnable_tasks () at src/task.c:387 #9 0x0000564fc99ccba0 in run_poll_loop () at src/haproxy.c:2652 #10 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2717 #11 0x0000564fc992779c in main (argc=<optimized out>, argv=<optimized out>) at src/haproxy.c:3379 ср, 17 апр. 2019 г. в 05:11, Willy Tarreau <w...@1wt.eu>: > Hi Maksim, > > On Tue, Apr 16, 2019 at 07:28:28AM +0200, Willy Tarreau wrote: > > > So I agree upon another thread activity. The unique thing about > > > these servers - all of them use haproxy-agent to set up weights of > their > > > backends. Other instances with no haproxy-agent in their configs don't > > > produce cores. > > > > Great, this will definitely help me validate my hypothesis. I'm not sure > > the fix will be easy but I'm back to this. > > OK so I could finally figure what the problem was and fix it. The upper > level function used to expect to be called with the server's lock held > while it is responsible for choosing the server... As you can expect, > it didn't have good chances to resist to concurrency. > > I've merged the fix into 2.0-dev and backported it into 1.9-maint. Feel > free to update to latest 1.9 git or snapshot. > > Thank you very much for your report, it was extremely helpful! > Willy >