Hi, all. This annoying bug can be experienced in 1.7-2.0 servers (while
1.9 has added another bug of high CPU utilization - unrelated to this).
In essence, once an external server that we forward internal requests to
stops responding for some time and comes back to life a bit later, more
often than not haproxy can no longer reach it.
Configuration is very simple
global
maxconn 16384
daemon
nbproc 1
user nobody
group nobody
log /var/run/log local0
defaults
retries 3
timeout connect 5000
timeout client 3600000
timeout server 3600000
log global
option log-health-checks
listen amazon_ses
bind 127.0.0.2:2588
mode tcp
no option http-server-close
default_backend bk_amazon_ses
backend bk_amazon_ses
mode tcp
no option http-server-close
timeout connect 5s
server amazon email-smtp.us-west-2.amazonaws.com:587 check inter
30s fall 1440 rise 1
That's it. if email-smtp.us-west-2.amazonaws.com:587 fails
intermittently and the downtime lasts more than a few 30 sec checks, it
can then no longer be accessed via 127.0.0.2:2588 even if the external
servers resumes normal operation, and nothing short of a reload (-sf)
fixes the problem.