Le 16/10/2020 à 10:04, Christopher Faulet a écrit :
Le 13/10/2020 à 14:53, Peter Statham a écrit :
Hello,

We've found an issue when using agent checks in conjunction with the weighted
least connections algorithm in multithreaded mode.  It seems to me as if it is
possible for next_eweight in struct server to be modified in another thread
during the execution of fwlc_srv_reposition.  If next_eweight is set to zero
then a division by zero occurs on line 59 in src/lb_fwlc.c in fwlc_queue_srv.

I notice that in haproxy-2.0.18 this section of code is protected by
HA_SPINLOCKs and I've been unable to replicate this issue in that version.

I've written an agent (attached), bad_agent.py, which provokes this condition by
switching randomly between 1 and 0 percent.  I also include a minimal
configuration, cfg (also attached), which seems sufficient to cause the issue.
With these two running “ab -n 5000000 -c 500 http://192.168.92.1:8080/” will
quickly crash the haproxy process.

I include links to a coredump and the binary that generated it (unstripped).
The backtrace of thread 1 follows.


Hi,

Thanks for the reproducer. I'm able to crash HAProxy too using your config and
your agent. It seems to only crash on the 1.8. I'll investigate.


Hi,

In fact, it fails in all branches supporting the threads. The leasconn and first loadbalancing algorithms are affected by this bug. In leastconn, it may crash because of the division by 0 when the server weight is set to 0. But for the both algos, the server tree may be also corrupted, leading to stranger and undefined bugs.

I pushed a fix (commit 26a52a) and backported it as far as 1.8. So, it should be fixed in all branches now.

Thanks !
--
Christopher Faulet

Reply via email to