Hello! On Mon, Mar 25, 2024 at 01:31:26PM +0100, Sébastien Rebecchi wrote:
> I have an issue with nginx closing prematurely connections when reload is > performed. > > I have some nginx servers configured to proxy_pass requests to an upstream > group. This group itself is composed of several servers which are nginx > themselves, and is configured to use keepalive connections. > > When I trigger a reload (-s reload) on an nginx of one of the servers which > is target of the upstream, I see in error logs of all servers in front that > connection was reset by the nginx which was reloaded. [...] > And here the kind of error messages I get when I reload nginx of "IP_1": > > --- BEGIN --- > > 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104: > Connection reset by peer) while reading response header from upstream, > client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST > /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: " > http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer: > "REFERRER_HIDDEN" > > --- END --- > > > I thought -s reload was doing graceful shutdown of connections. Is it due > to the fact that nginx can not handle that when using keepalive > connections? Is it a bug? > > I am using nginx 1.24.0 everywhere, no particular This looks like a well known race condition when closing HTTP connections. In RFC 2616, it is documented as follows (https://datatracker.ietf.org/doc/html/rfc2616#section-8.1.4): A client, server, or proxy MAY close the transport connection at any time. For example, a client might have started to send a new request at the same time that the server has decided to close the "idle" connection. From the server's point of view, the connection is being closed while it was idle, but from the client's point of view, a request is in progress. This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2). Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s). Confirmation by user-agent software with semantic understanding of the application MAY substitute for user confirmation. The automatic retry SHOULD NOT be repeated if the second sequence of requests fails. That is, when you shutdown your backend server, it closes the keepalive connection - which is expected to be perfectly safe from the server point of view. But if at the same time a request is being sent to this connection by the client (frontend nginx server in your case) - this might result in an error. Note that the race is generally unavoidable and such errors can happen at any time, during any connection close by the server. Closing multiple keepalive connections during shutdown makes such errors more likely though, since connections are closed right away, and not after keepalive timeout expires. Further, since in your case there are just a few loaded keepalive connections, this also makes errors during shutdown more likely. Typical solution is to retry such requests, as RFC 2616 recommends. In particular, nginx does so based on the "proxy_next_upstream" setting. Note that to retry POST requests you will need "proxy_next_upstream ... non_idempotent;" (which implies that non-idempotent requests will be retried on errors, and might not be the desired behaviour). Another possible approach is to try to minimize the race window by waiting some time after the shutdown before closing keepalive connections. There were several attempts in the past to implement this, the last one can be found here: https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html While there are some questions to the particular patch, something like this should probably be implemented. This is my TODO list, so a proper solution should be eventually available out of the box in upcoming freenginx releases. Hope this helps. -- Maxim Dounin http://mdounin.ru/ -- nginx mailing list nginx@freenginx.org https://freenginx.org/mailman/listinfo/nginx