Hi Sam,

> We are running dual HAProxy machines as our load balancers for our
> web application, with keepalived for failover.  This is the 2nd time
> that both HAProxy instances have died in production with no indication
> as to why.

When this happens, do you always see both HAProxy instances crashing at the
same time?



> The load on the servers at the time was extremely low.  We routinely do
> MUCH MUCH more traffic than we had at the time of the last crash. Our
> environment will stay up for months at a time without a hickup, and then
> boom.

Do you have outages on the backends or the network in between the proxy and
the backend when this happens?

I'm asking because you also posted:
> HAProxy error log before crash:
>    Nov 10 19:02:32 localhost haproxy[10926]: backend chat-cluster-01 has
>    no server available!


.. which is actually a major crash bug in the HAProxy releases you are using:

http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=0fc36e3ae99ccbe6de88cf64093f3045e526d088


There is another major bug fixed in commit 506d050600, but I don't think you
are running into it.


Both bugs are fixed in HAProxy snapshot 20130707 (haproxy-ss-20130707.tar.gz)
and later:

http://haproxy.1wt.eu/download/1.5/src/snapshot/


I would suggest:
- clarify if there is a connection between backend server failures and
  HAProxy crashes
- either backport the bugfix or upgrade to one of the snapshots containing
  the bugfix

Note that -dev17 has 2 security problems (one fixed in dev18, another one
fixed in dev19, see http://haproxy.1wt.eu/news.html), so I would suggest
you upgrade both HAProxy instances.


If my guess here is wrong and HAProxy is still crashing, you need to
enabled coredumping:

http://www.mail-archive.com/[email protected]/msg09472.html



Regards,

Lukas                                     

Reply via email to