Hello Jerry,
you trace is very interesting.
> 03:36:45 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 35
> 03:36:45 setsockopt(35, SOL_TCP, TCP_NODELAY, [1], 4) = 0
> 03:36:45 connect(35, {sa_family=AF_INET, sin_port=htons(44757),
> sin_addr=inet_addr("172.17.48.32")}, 16) = -1 EINPROGRESS (Operation now in
> progress)
> 03:36:45 sendto(35, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454,
> MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily
> unavailable)
> 03:36:45 sendto(35, "GET /hosts/45/metrics HTTP/1.1\r\n"..., 454,
> MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED(Connection refused)
What this means is that something between haproxy and the server
actively rejected the connection. If your local conntrack table
was full, it would not do that, it would just time out.
Still is it possible that the remote server sometimes rejects some
connections ? One of the possibilities would be that your local
source port range is too short and that it rejects early reuse of
a same source port.
In fact what could be done would be to track only SYN, FIN, and RST
packets between the two machines :
tcpdump -npi eth0 'tcp[13]&7!=0' -w tcp-flags.cap
Then look for any RST there and check if it matches a SYN once the
source port is found :
tcpdump -Svvnr tcp-flags.cap 'tcp[13]&4!=0'
Are you sure that the server never unbinds nor crashes ? And do you
have any intermediary firewall between haproxy and the remote server ?
Regards,
Willy