Re: healthchecks (to uwsgi) possible regression 1.9.8 -> 1.9.9

Jarno Huuskonen Fri, 04 Oct 2019 09:29:14 -0700

Hi Willy,

On Fri, Oct 04, Willy Tarreau wrote:
> Hi Jarno,
> 
> On Wed, Oct 02, 2019 at 01:08:14PM +0300, Jarno Huuskonen wrote:
> > Hello,
> > 
> > I was testing haproxy -> uwsgi(alert.io) and noticed a possible regression
> > with healthchecks(httpchk).
> > With 1.9.9 uwsgi logs:
> > [uwsgi-http key: host.name.fi client_addr: 127.0.0.1 client_port: 45715] 
> > hr_read(): Connection reset by peer [plugins/http/http.c line 917]
> > 
> > health checks work
> > (option httpchk GET /_ HTTP/1.1\r\nHost:\ host.name.fi\r\nUser-Agent:\ 
> > haproxy)
> > but uwsgi logs the hr_read() warning/error.
> > 
> > I bisected 1.9.9 and this commit is probably the commit that changes
> > behaviour between 1.9.8 and 1.9.9:
> > 5d0cb90eb78f869e8801b34eddfdfd5dd8360e71 is the first bad commit
> > commit 5d0cb90eb78f869e8801b34eddfdfd5dd8360e71
> > Author: Olivier Houchard <ohouch...@haproxy.com>
> > Date:   Fri Jun 14 15:26:06 2019 +0200
> > 
> >     BUG/MEDIUM: connections: Don't call shutdown() if we want to disable 
> > linger.
> > 
> >     In conn_sock_shutw(), avoid calling shutdown() if linger_risk is set. 
> > Not
> >     doing so will result in getting sockets in TIME_WAIT for some time.
> >     This is particularly observable with health checks.
> > 
> >     This should be backported to 1.9.
> > 
> >     (cherry picked from commit fe4abe62c7c5206dff1802f42d17014e198b9141)
> >     Signed-off-by: Christopher Faulet <cfau...@haproxy.com>
> 
> Hmmm that's annoying, really, because we've opened a huge can of worms
> when fixing the first failed check and we're constantly displacing the
> problem somewhere else :-/
> 
> Yes, please do provide an strace, and a tcpdump, that would be nice.
> I suspect that we'll possibly see a FIN from the server, without the
> equivalent recv()==0 in haproxy, and that the setsockopt() call
> resulting in the RST being sent doesn't ack the FIN. Normally we
> should perform a clean shutdown+close if the FIN was received with
> the response and detected in time.


I sent pcap/strace offlist.
(strace -f -o -ttt, tcpdump -n -p -s 16384 -w ... host 127.0.0.1 and
port 8080).

I think in packet capture the second health checks causes
"client_addr: 127.0.0.1 client_port: 2779] hr_read(): Connection reset by peer 
[plugins/http/http.c line 917]"
(I think uswgi logs client_port incorrectly, ntohs(2779) gives 56074
(and port 56074 is in packet capture)).

(haproxy version: HA-Proxy version 2.1-dev2 2019/10/01).

I tried to reproduce with very minimal flask/uwsgi hello world app
and there hr_read happens very rarely. 
With alerta(.io) app this happens more regularly (AFAIK not with every check).
So maybe this is weird timing issue or bug in uwsgi.

> I'm seeing that your check request doesn't contain "connection: close",
> so actually it's possible that your server doesn't send the SYN, in
> which case we really need to close with RST. Could you please try to
> add "connection: close" to your httpchk line ?

I had Connection: close, but removed it after I added
http-check expect string OK
(http-check expect adds Connection: close).

-Jarno

-- 
Jarno Huuskonen

Re: healthchecks (to uwsgi) possible regression 1.9.8 -> 1.9.9

Reply via email to