Hi Vincent,

What's odd is that if I failover all virtual IPs to one server and
set net.ipv4.ip_nonlocal_bind=0 on that server the issue goes away. The
issue remains "fixed" when I fail half of the virtual IPs back to the
secondary server and set net.ipv4.ip_nonlocal_bind=1. However, after a
reboot of both servers the initial behavior comes back. This seems to be
something related to the way the 2.6.32 kernel handles
net.ipv4.ip_nonlocal_bind and how it relates to the sockets' file
descriptors.

The logs don't show anything suspicious. When a reload is successful I see
the expected output in the logs:

Oct 30 09:49:53 127.0.0.1 haproxy[26191]: Proxy haproxy-stats started.
Oct 30 09:50:22 127.0.0.1 haproxy[26192]: Pausing proxy haproxy-stats.
Oct 30 09:50:22 127.0.0.1 haproxy[26215]: Proxy haproxy-stats started.
Oct 30 09:50:22 127.0.0.1 haproxy[26192]: Stopping proxy haproxy-stats in 0
ms.
Oct 30 09:50:22 127.0.0.1 haproxy[26192]: Proxy haproxy-stats stopped (FE:
0 conns, BE: 0 conns).

When a reload is unsuccessful the code that pauses, starts a new proxy, and
stops the original proxy isn't called so there is no output in the logs.
Instead the Alert (cannot bind socket) is sent to stderr and is logged by
consul-template.

I'm going to compile the 3.10 kernel from CentOS 7 for CentOS 6 and see if
the behavior persists and report back.

Thanks,
Chris


On Fri, Oct 30, 2015 at 3:04 AM, Vincent Bernat <[email protected]> wrote:

>  ❦ 30 octobre 2015 00:34 -0400, Chris Riley <[email protected]> :
>
> > The kernel version is 2.6.32-358.23.2.el6.x86_64, the OS is CentOS
> > 6.4.
>
> With this version of the kernel, the previous instance of HAProxy has to
> release the port before the new one can bind. It seems that in your
> case, this doesn't happen. Nothing suspicious in the logs of the
> previous instance?
> --
> Let us endeavor so to live that when we come to die even the undertaker
> will be
> sorry.
>                 -- Mark Twain, "Pudd'nhead Wilson's Calendar"
>

Reply via email to