Hello Willy!

I hope i could find some cores still available and will search for them
tomorrow.
But since they could contain some sensitive information, its not a good
idea to share it right here on the mail list.
So could you please tell me some personal email address where I could send
the link to a core?

--
Best regards,
Maksim Kupriianov

чт, 11 апр. 2019 г. в 17:03, Willy Tarreau <[email protected]>:

> Hi again,
>
> On Thu, Apr 11, 2019 at 11:53:28AM +0200, Willy Tarreau wrote:
> > > Got multiple incidents of failure with 1.9.6:
> > > Core was generated by `/usr/sbin/haproxy -Ws -f
> /etc/haproxy/haproxy.cfg -p
> > > /var/run/haproxy'.
> > > Program terminated with signal SIGFPE, Arithmetic exception.
> > > #0  0x0000559afb73c533 in fwrr_update_position (grp=0x559afbd9fb68,
> > > grp=0x559afbd9fb68, s=0x559afcc5f560) at src/lb_fwrr.c:498
> > > 498 HA_ATOMIC_ADD(&s->npos, (grp->next_weight / s->cur_eweight));
> > > [Current thread is 1 (Thread 0x7f879677c700 (LWP 776412))]
> > > (gdb) thread apply all bt
> >
> > Scary, that's not supposed to be possible in theory :
> >
> >   /* Computes next position of server <s> in the group. It is mandatory
> for <s>
> >    * to have a non-zero, positive eweight.
> >                ^^^^^^^^^
> >    *
> >    * The server's lock and the lbprm's lock must be held.
> >    */
> >   static inline void fwrr_update_position(struct fwrr_group *grp, struct
> server *s)
> >
> > So either we're doing something wrong somewhere in a caller, or we have
> > insufficient locking and sometimes this server's weight is put down to
> > zero between the moment the value is checked and the moment it's used.
> >
> > I'm having a look at it right now.
>
> I tried to follow all paths that lead to a zero cur_eweight that I could
> find and none of them leave the server in the tree. Then I tried to find
> all cases where this entry is updated or used and all are under the server
> lock, meaning that I don't see how another thread could have changed the
> value between the check and the use. I must obviously be wrong on at least
> one of them but I really can't figure which one. I guess the core will
> probably help a little bit if you still have it somewhere.
>
> Thanks,
> Willy
>

Reply via email to