Re: Stale connection reuse in the async client

Ryan Schmitt Thu, 07 Aug 2025 11:44:54 -0700

I learned something very important, which explains why I'm seeing a lot of
reports of TCP resets specifically:

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout

For each TCP request that a client makes through a Network Load Balancer,
> the state of that connection is tracked. If no data is sent through the
> connection by either the client or target for longer than the idle timeout,
> the connection is no longer tracked. If a client or target sends data after
> the idle timeout period elapses, the client receives a TCP RST packet to
> indicate that the connection is no longer valid.
>
> The default idle timeout value for TCP flows is 350 seconds, but can be
> updated to any value between 60-6000 seconds. Clients or targets can use
> TCP keepalive packets to restart the idle timeout. Keepalive packets sent
> to maintain TLS connections can't contain data or payload.

This is nasty: the client can't possibly know that it has a stale
connection until it sends a request, and then the error it gets
("Connection reset by peer") is both highly generic and (unlike
RequestNotExecutedException) not
obviously safe to retry on. The new TCP Keep-Alive options should eliminate
this failure mode, and as I write this I'm publishing a change to enable a
five-second keep-alive interval on all clients. This will also reduce the
occurrence of the (hypothesized) Lambda-specific race condition, since
there's no connection closure race if the connections don't get closed in
the first place.

Thanks for the PR, I'll test my reproducer against it. An issue I noticed
is that it's apparently not possible to read bytes _and_ endOfStream in a
single read operation, which for us means that we can't discover that the
connection has been closed until the next event loop iteration (and by then
the connection might have been leased out again). Would it be safe to
perform a second read that returns either 0 bytes or -1 (endOfStream)?

On Thu, Aug 7, 2025 at 9:20 AM Oleg Kalnichevski <[email protected]> wrote:

>
> >>
> >> What I'd like to know is:
> >>
> >> 1. Can we do anything to improve this race condition?
> >
>
> Please try this change-set:
>
> https://github.com/apache/httpcomponents-core/pull/543
>
> I should reduce the window of this race condition somewhat.
>
> Oleg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Stale connection reuse in the async client

Reply via email to