Thanks for the tips.

After digging into it, it is being caused by the nginx ingress controller.
Nginx is terminating the grpc connections and then load balancing requests
to pods underneath it. So while nginx is respecting the server's connection
timeout, it's not forwarding it to the client to reset its connections. So
the client just hangs on to an open connection forever.  There's a (closed)
bug in the nginx tracker about this exact problem -
https://github.com/kubernetes/ingress-nginx/issues/4402

Mark


On Wed, Apr 19, 2023 at 2:46 AM Antoine Tollenaere <
[email protected]> wrote:

> At least in Java, Go, and C, the DNS resolver refreshes when an underlying
> transport (the TCP connection) is closed. The transport can be closed for a
> variety of reasons. Most likely in your scenario, the traffic volume has an
> influence on whether connections are being closed on the AWS ELB side,
> which triggers DNS re-resolution in the high-volume case.
>
> One way to force periodic DNS re-resolution from the client side is to set
> the maximum connection age. In C it's done via the channel options
> GRPC_ARG_MAX_CONNECTION_AGE_MS and GRPC_ARG_MAX_CONNECTION_AGE_GRACE_MS
> documented here:
>
>
> https://grpc.github.io/grpc/core/group__grpc__arg__keys.html#gabd3a16f46ad2cb5f06064bb607df7b5b
>
> https://grpc.github.io/grpc/core/group__grpc__arg__keys.html#gaf4574abe94c339c6f21163bca6e7b6b7
>
>
> There are equivalents for other languages. This will cause a bit of
> connection churn, so you probably don't want to set the maximum age too
> low. Another option would be to implement regular connection closing on the
> server side, on your AWS ELB configuration -- not sure if AWS ELBs provide
> that as an option.
>
> Hope this helps,
> Antoine.
>
> On Tue, Apr 18, 2023 at 7:23 PM Mark Robinson <[email protected]> wrote:
>
>> Hi,
>>
>> I have a problem where I have two services communicating but won't update
>> the endpoint connection information when DNS updates.
>>
>> Very briefly, the architecture is
>>
>> http -> [AWS LB] -> [Service A] -> (grpc) [AWS ELB] -> [Service B]
>>
>> When I change the DNS information for service B, with the intent of
>> sending it through a physically different ELB, service A won't change to
>> point to the new IP address at low traffic volume.
>>
>> If I increase the traffic volume, it will switch fairly quickly. On the
>> order of minutes. However, if I keep the traffic volume low (<2 RPS/pod),
>> it'll stick on the old connection for hours if not forever.  The only
>> solution I've found is to restart all of service A, but that's not a great
>> solution.
>>
>> Does anyone know what might be going on here?
>>
>> Mark
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "grpc.io" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/grpc-io/1042f275-b01f-49d3-b39c-4edcfac34a84n%40googlegroups.com
>> <https://groups.google.com/d/msgid/grpc-io/1042f275-b01f-49d3-b39c-4edcfac34a84n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAHaGKycR%2BuNgxkOFG7mi%2BUtp8VpN%3D%2B6NWQK3Bp_6653ipm0egQ%40mail.gmail.com.

Reply via email to