Hi Willy, On Fri, Oct 04, Willy Tarreau wrote: > Hi Jarno, > > On Wed, Oct 02, 2019 at 01:08:14PM +0300, Jarno Huuskonen wrote: > > Hello, > > > > I was testing haproxy -> uwsgi(alert.io) and noticed a possible regression > > with healthchecks(httpchk). > > With 1.9.9 uwsgi logs: > > [uwsgi-http key: host.name.fi client_addr: 127.0.0.1 client_port: 45715] > > hr_read(): Connection reset by peer [plugins/http/http.c line 917] > > > > health checks work > > (option httpchk GET /_ HTTP/1.1\r\nHost:\ host.name.fi\r\nUser-Agent:\ > > haproxy) > > but uwsgi logs the hr_read() warning/error. > > > > I bisected 1.9.9 and this commit is probably the commit that changes > > behaviour between 1.9.8 and 1.9.9: > > 5d0cb90eb78f869e8801b34eddfdfd5dd8360e71 is the first bad commit > > commit 5d0cb90eb78f869e8801b34eddfdfd5dd8360e71 > > Author: Olivier Houchard <ohouch...@haproxy.com> > > Date: Fri Jun 14 15:26:06 2019 +0200 > > > > BUG/MEDIUM: connections: Don't call shutdown() if we want to disable > > linger. > > > > In conn_sock_shutw(), avoid calling shutdown() if linger_risk is set. > > Not > > doing so will result in getting sockets in TIME_WAIT for some time. > > This is particularly observable with health checks. > > > > This should be backported to 1.9. > > > > (cherry picked from commit fe4abe62c7c5206dff1802f42d17014e198b9141) > > Signed-off-by: Christopher Faulet <cfau...@haproxy.com> > > Hmmm that's annoying, really, because we've opened a huge can of worms > when fixing the first failed check and we're constantly displacing the > problem somewhere else :-/ > > Yes, please do provide an strace, and a tcpdump, that would be nice. > I suspect that we'll possibly see a FIN from the server, without the > equivalent recv()==0 in haproxy, and that the setsockopt() call > resulting in the RST being sent doesn't ack the FIN. Normally we > should perform a clean shutdown+close if the FIN was received with > the response and detected in time.
I sent pcap/strace offlist. (strace -f -o -ttt, tcpdump -n -p -s 16384 -w ... host 127.0.0.1 and port 8080). I think in packet capture the second health checks causes "client_addr: 127.0.0.1 client_port: 2779] hr_read(): Connection reset by peer [plugins/http/http.c line 917]" (I think uswgi logs client_port incorrectly, ntohs(2779) gives 56074 (and port 56074 is in packet capture)). (haproxy version: HA-Proxy version 2.1-dev2 2019/10/01). I tried to reproduce with very minimal flask/uwsgi hello world app and there hr_read happens very rarely. With alerta(.io) app this happens more regularly (AFAIK not with every check). So maybe this is weird timing issue or bug in uwsgi. > I'm seeing that your check request doesn't contain "connection: close", > so actually it's possible that your server doesn't send the SYN, in > which case we really need to close with RST. Could you please try to > add "connection: close" to your httpchk line ? I had Connection: close, but removed it after I added http-check expect string OK (http-check expect adds Connection: close). -Jarno -- Jarno Huuskonen