I learned something very important, which explains why I'm seeing a lot of reports of TCP resets specifically:
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout For each TCP request that a client makes through a Network Load Balancer, > the state of that connection is tracked. If no data is sent through the > connection by either the client or target for longer than the idle timeout, > the connection is no longer tracked. If a client or target sends data after > the idle timeout period elapses, the client receives a TCP RST packet to > indicate that the connection is no longer valid. > > The default idle timeout value for TCP flows is 350 seconds, but can be > updated to any value between 60-6000 seconds. Clients or targets can use > TCP keepalive packets to restart the idle timeout. Keepalive packets sent > to maintain TLS connections can't contain data or payload. This is nasty: the client can't possibly know that it has a stale connection until it sends a request, and then the error it gets ("Connection reset by peer") is both highly generic and (unlike RequestNotExecutedException) not obviously safe to retry on. The new TCP Keep-Alive options should eliminate this failure mode, and as I write this I'm publishing a change to enable a five-second keep-alive interval on all clients. This will also reduce the occurrence of the (hypothesized) Lambda-specific race condition, since there's no connection closure race if the connections don't get closed in the first place. Thanks for the PR, I'll test my reproducer against it. An issue I noticed is that it's apparently not possible to read bytes _and_ endOfStream in a single read operation, which for us means that we can't discover that the connection has been closed until the next event loop iteration (and by then the connection might have been leased out again). Would it be safe to perform a second read that returns either 0 bytes or -1 (endOfStream)? On Thu, Aug 7, 2025 at 9:20 AM Oleg Kalnichevski <ol...@apache.org> wrote: > > >> > >> What I'd like to know is: > >> > >> 1. Can we do anything to improve this race condition? > > > > Please try this change-set: > > https://github.com/apache/httpcomponents-core/pull/543 > > I should reduce the window of this race condition somewhat. > > Oleg > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org > For additional commands, e-mail: dev-h...@hc.apache.org > >