Re: healthchecks (to uwsgi) possible regression 1.9.8 -> 1.9.9

Willy Tarreau Mon, 07 Oct 2019 23:33:14 -0700

Hi Jarno,

On Fri, Oct 04, 2019 at 07:28:15PM +0300, Jarno Huuskonen wrote:
> I sent pcap/strace offlist.


Thanks, that was very useful.

> (strace -f -o -ttt, tcpdump -n -p -s 16384 -w ... host 127.0.0.1 and
> port 8080).
> 
> I think in packet capture the second health checks causes
> "client_addr: 127.0.0.1 client_port: 2779] hr_read(): Connection reset by 
> peer [plugins/http/http.c line 917]"
> (I think uswgi logs client_port incorrectly, ntohs(2779) gives 56074
> (and port 56074 is in packet capture)).

It's as usual, when you start to troubleshoot a bug you find a myriad of
other ones around :-)

> (haproxy version: HA-Proxy version 2.1-dev2 2019/10/01).
> 
> I tried to reproduce with very minimal flask/uwsgi hello world app
> and there hr_read happens very rarely. 
> With alerta(.io) app this happens more regularly (AFAIK not with every check).
> So maybe this is weird timing issue or bug in uwsgi.

So it is indeed a timing issue. The check running on port 56062 shows the
following sequence:

17:10:57.877876 IP localhost.56062 > localhost.8080: Flags [S]
17:10:57.877898 IP localhost.8080 > localhost.56062: Flags [S.]
17:10:57.877909 IP localhost.56062 > localhost.8080: Flags [.]
17:10:57.878065 IP localhost.56062 > localhost.8080: Flags [P.]
17:10:57.878078 IP localhost.8080 > localhost.56062: Flags [.]
17:10:57.879933 IP localhost.8080 > localhost.56062: Flags [P.]
17:10:57.879939 IP localhost.56062 > localhost.8080: Flags [.]
17:10:57.880008 IP localhost.8080 > localhost.56062: Flags [F.]
17:10:57.880333 IP localhost.56062 > localhost.8080: Flags [F.]
17:10:57.880341 IP localhost.8080 > localhost.56062: Flags [.]

Note the FIN sent 75 microseconds after the response. This resulted in
recvfrom() returning zero and the connection to be cleanly closed. Now
regarding port 56074 that is causing excess logs :

17:11:04.132867 IP localhost.56074 > localhost.8080: Flags [S]
17:11:04.132890 IP localhost.8080 > localhost.56074: Flags [S.]
17:11:04.132904 IP localhost.56074 > localhost.8080: Flags [.]
17:11:04.133083 IP localhost.56074 > localhost.8080: Flags [P.]
17:11:04.133098 IP localhost.8080 > localhost.56074: Flags [.]
17:11:04.135101 IP localhost.8080 > localhost.56074: Flags [P.]
17:11:04.135107 IP localhost.56074 > localhost.8080: Flags [.]
17:11:04.135316 IP localhost.56074 > localhost.8080: Flags [R.]

As you can see, even 215 microseconds after the response there is still
no FIN. I've checked and in both cases the headers are the same. There
is indeed a "connection: close" header emitted by haproxy, none is
advertised in the response, but clearly it's a matter of delay. And
this causes recvfrom() to return EAGAIN, so haproxy is forced to hard-
close the connection (as if it were doing option http-server-close).
Please also note that returned data are properly ACKed so there is not
much more we can do there.

Before the patch that you bisected, you didn't face this because of a
bug that was preventing the hard close from working, and instead you
were accumulating TIME_WAIT sockets on the client side. So I'm afraid
to say that for now the best I can say is that you'll still have to
live with these occasional logs :-/

I'd really like to see a massive rework being done on the checks. Your
report made me realize that there is one feature we're currently missing,
which is the ability to wait for a close once we've got a response. At
the moment we cannot add this because checks work in 2 steps :

     1) connect and send the "request"
     2) receive the "reponse" and close.

There's no real state machine there, it's roughly "if request was not
sent, send it, otherwise try to receive a response; if response is
received, then process it and close otherwise wait for response". So
we cannot remain in a "waiting for closing" state after we receive a
response. I'm wondering if that couldn't be achieved using tcp-checks
however. I quickly tried but couldn't find a reliable way of doing this
but I'm thing that we could possibly extend the tcp-check rules to have
"tcp-check expect close" that would wait for a read0.

Regards,
Willy

Re: healthchecks (to uwsgi) possible regression 1.9.8 -> 1.9.9

Reply via email to