Re: ip_nonlocal_bind=1 set but sometimes get "cannot bind socket" on reload (-sf)

Chris Riley Fri, 30 Oct 2015 08:19:48 -0700

Hi Willy,

The permissions where one of the first things I checked. consul-template
runs as root in order to be able to reload/restart daemon and it's using
the same init script that the system uses on startup. Not all of the
reloads fail, the first few initial ones are successful. What's odd is that
the behavior goes away when I failover all IPs to one server and
set net.ipv4.ip_nonlocal_bind=0. After that all reloads are successful, no
matter how many times in a row reload is called. The issue remains at bay
even after failing half of the IPs back over to the secondary server and
setting net.ipv4.ip_nonlocal_bind=1 again. That is until the servers
reboot, then the behavior returns. Vincent got me thinking about the 2.6.32
kernel that is part of CentOS 6.4. I'm wondering if
net.ipv4.ip_nonlocal_bind behaves oddly in 2.6.x with respect to the status
of existing socket file descriptors. I'm going to try kernel 3.10 from
CentOS 7 to see if I can reproduce it in 3.10 in order to rule out or
confirm an issue with the kernel.

However, I'm not sure that's the issue. When a reload fails there is
nothing in the log file that indicates that haproxy saw SIGTTOU or SIGUSR1
("Pausing %s %s." and "Stopping %s %s in %d ms."). I can reproduce this
behavior if I don't provide a PID to -sf. When looking at the code
in proxy.c it looks like pause_proxy() is either not being called
by pause_proxies in haproxy.c (due to the missed SIGTTOU) or in
pause_proxy() the proxy state check is returning 1 at the top of the
pause_proxy() function. I'm going to add some additional logging statements
to see if I can isolate what's happening.

Regards,
Chris

On Fri, Oct 30, 2015 at 3:11 AM, Willy Tarreau <[email protected]> wrote:

> On Fri, Oct 30, 2015 at 08:04:48AM +0100, Vincent Bernat wrote:
> >  ??? 30 octobre 2015 00:34 -0400, Chris Riley <[email protected]> :
> >
> > > The kernel version is 2.6.32-358.23.2.el6.x86_64, the OS is CentOS
> > > 6.4.
> >
> > With this version of the kernel, the previous instance of HAProxy has to
> > release the port before the new one can bind. It seems that in your
> > case, this doesn't happen. Nothing suspicious in the logs of the
> > previous instance?
>
> It would be nice to ensure the process is reloaded with appropriate
> permissions. The new process indeed needs to send a signal to the old
> process, and bind to the ports. If any of these operations fail, it will
> not be able to start.
>
> Willy
>
>

Re: ip_nonlocal_bind=1 set but sometimes get "cannot bind socket" on reload (-sf)

Reply via email to