Hello Willy! I hope i could find some cores still available and will search for them tomorrow. But since they could contain some sensitive information, its not a good idea to share it right here on the mail list. So could you please tell me some personal email address where I could send the link to a core?
-- Best regards, Maksim Kupriianov чт, 11 апр. 2019 г. в 17:03, Willy Tarreau <[email protected]>: > Hi again, > > On Thu, Apr 11, 2019 at 11:53:28AM +0200, Willy Tarreau wrote: > > > Got multiple incidents of failure with 1.9.6: > > > Core was generated by `/usr/sbin/haproxy -Ws -f > /etc/haproxy/haproxy.cfg -p > > > /var/run/haproxy'. > > > Program terminated with signal SIGFPE, Arithmetic exception. > > > #0 0x0000559afb73c533 in fwrr_update_position (grp=0x559afbd9fb68, > > > grp=0x559afbd9fb68, s=0x559afcc5f560) at src/lb_fwrr.c:498 > > > 498 HA_ATOMIC_ADD(&s->npos, (grp->next_weight / s->cur_eweight)); > > > [Current thread is 1 (Thread 0x7f879677c700 (LWP 776412))] > > > (gdb) thread apply all bt > > > > Scary, that's not supposed to be possible in theory : > > > > /* Computes next position of server <s> in the group. It is mandatory > for <s> > > * to have a non-zero, positive eweight. > > ^^^^^^^^^ > > * > > * The server's lock and the lbprm's lock must be held. > > */ > > static inline void fwrr_update_position(struct fwrr_group *grp, struct > server *s) > > > > So either we're doing something wrong somewhere in a caller, or we have > > insufficient locking and sometimes this server's weight is put down to > > zero between the moment the value is checked and the moment it's used. > > > > I'm having a look at it right now. > > I tried to follow all paths that lead to a zero cur_eweight that I could > find and none of them leave the server in the tree. Then I tried to find > all cases where this entry is updated or used and all are under the server > lock, meaning that I don't see how another thread could have changed the > value between the check and the use. I must obviously be wrong on at least > one of them but I really can't figure which one. I guess the core will > probably help a little bit if you still have it somewhere. > > Thanks, > Willy >

