Re: [grpc-io] Re: Load balancer and resolver with Ruby

Emmanuel DELMAS Wed, 01 Sep 2021 05:18:43 -0700

Hi Chen

> Given that client is doing client-side LB with round_robin, is setting
max_connection_age on the server-side the right way to solve this problem?
Will clients be able to refresh and reconnect automatically, or do we need
to recreate the client (the underlying channel) periodically?
I set max_connection_age on the server-side and it works well. Nothing else
to do on the client side. When max_connection_age is reached, a GOAWAY
signal is sent to the client. Each time a client receives a GOAWAY signal,
it automatically refreshes its DNS and creates connections for new services
and the one that has been closed.


> Also, the GOAWAY signal is random. Do client implementation need to
handle this in particular?
What do you mean exactly? I'm not sure to be able to answer this point.

Regards

*Emmanuel Delmas*
Backend Developer
CSE Member

*LinkedIn <https://www.linkedin.com/in/emmanueldelmasisep/>.____*


*19 rue Blanche, 75009 Paris, France*


Le mer. 1 sept. 2021 à 01:43, Chen Song <[email protected]> a écrit :

> I want to follow up on this thread, as we have similar requirements (force
> clients to refresh server addresses from dns resolver as new pods will be
> launched on K8s) but client is in Python.
>
> Given that client is doing client-side LB with round_robin, is setting
> max_connection_age on the server-side the right way to solve this problem?
> Will clients be able to refresh and reconnect automatically, or do we need
> to recreate the client (the underlying channel) periodically?
> Also, the GOAWAY signal is random. Do client implementation need to handle
> this in particular?
>
> Chen
> On Wednesday, December 23, 2020 at 4:50:31 AM UTC-5 Emmanuel Delmas wrote:
>
>> > Just curious, how has this been determined that the GOAWAY frame wasn't
>> received? Also what are your values of MAX_CONNECTION_AGE and
>> MAX_CONNECTION_AGE_GRACE ?
>>
>> MAX_CONNECTION_AGE and MAX_CONNECTION_AGE_GRACE was infinite but I
>> changed this week MAX_CONNECTION_AGE to 5 minutes.
>>
>> I followed this documentation to display gRPC logs and the see GOAWAY
>> signal.
>> https://github.com/grpc/grpc/blob/v1.25.x/TROUBLESHOOTING.md
>> https://github.com/grpc/grpc/blob/master/doc/environment_variables.md
>> To reproduce the error, I setup a channel without round robin load
>> balancing (only one subchannel).
>> ExampleService::Stub.new("headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
>> :this_channel_is_insecure, timeout: 5)
>> Then I recursively kill the server pod connected to my client. When I see
>> in the logs that GOAWAY signal is received, a reconnection occurs without
>> any error in my requests. But when the reception of the GOAWAY signal is
>> not logged, no reconnection occurs and I receive a bunch of
>> DeadlineExceeded errors during several minutes.
>> The error still occur even if I create a new channel. However, if a
>> recreate the channel adding "dns:" at the beginning of the host, it works.
>> ExampleService::Stub.new("dns:headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
>> :this_channel_is_insecure, timeout: 5)
>> The opposite if true. If I create the channel with "dns:" at the
>> beginning of the host, it can lead to the same failure and I will be able
>> to create a working channel removing the "dns:" at the beginning of the
>> host.
>>
>>
>> *Did you already heard this kind of issue? Is there some cache in the dns
>> resolver?*
>>
>> > A guess: one possible thing to look for is if IP packets to/from the
>> pod's address stopped forwarding, rendering the TCP connection to it a
>> "black hole". In that case, a grpc client will, by default, realize that a
>> connection is bad only after the TCP connection times out (typically ~15
>> minutes). You may set keepalive parameters to notice the brokenness of such
>> connections faster -- see references to keepalive in
>> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md
>> for more details.
>>
>> Yes. It is like requests go to a black hole. And has you said, it is
>> naturally fixed by itself after around 15 minutes. I will add a client side
>> keep alive to make it shorter. But even with 1 minute instead of 15, I need
>> to find another workaround in order to avoid degraded services for my
>> customer.
>>
>> Thank you.
>>
>> Le mardi 22 décembre 2020 à 21:34:32 UTC+1, [email protected] a écrit :
>>
>>> > It happens that sometimes, the GOAWAY signal isn't received by the
>>> client.
>>>
>>> Just curious, how has this been determined that the GOAWAY frame wasn't
>>> received? Also what are your values of MAX_CONNECTION_AGE and
>>> MAX_CONNECTION_AGE_GRACE ?
>>>
>>> A guess: one possible thing to look for is if IP packets to/from the
>>> pod's address stopped forwarding, rendering the TCP connection to it a
>>> "black hole". In that case, a grpc client will, by default, realize that a
>>> connection is bad only after the TCP connection times out (typically ~15
>>> minutes). You may set keepalive parameters to notice the brokenness of such
>>> connections faster -- see references to keepalive in
>>> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md
>>> for more details.
>>>
>>>
>>>
>>> On Tuesday, December 22, 2020 at 11:30:44 AM UTC-8 Emmanuel Delmas wrote:
>>>
>>>> Thank you. I've setup MAX_CONNECTION_AGE and it seems to work well.
>>>>
>>>> I was looking for a way to refresh the name resolution because I'm
>>>> facing another issue.
>>>> It happens that sometimes, the GOAWAY signal isn't received by the
>>>> client.
>>>> In this case, I receive a bunch of DeadlineExceeded errors, the client
>>>> still sending message to a deleted Kubernetes pod.
>>>> I wanted to trigger a refresh at this time but I understand it is not
>>>> possible.
>>>>
>>>> Do you already get this kind of issue?
>>>> Do you have any advice to handle a not received GOAWAY signal?
>>>>
>>>> Le lundi 21 décembre 2020 à 19:42:17 UTC+1, [email protected] a écrit :
>>>>
>>>>> > "But when I create new pods after the connection or a reconnection,
>>>>> calls are not load balanced on these new servers."
>>>>>
>>>>> Can you elaborate a bit on what exactly is done here and the expected
>>>>> behavior?
>>>>>
>>>>> In general, one thing to note about gRPC's client channel/stub is that
>>>>> in general a client will not refresh the name resolution process unless it
>>>>> encounters a problem with the current connection(s) that it has. So for
>>>>> example if the following events happen:
>>>>> 1) client stub resolves
>>>>> headless-test-grpc-master.test-grpc.svc.cluster.local in DNS, to addresses
>>>>> 1.1.1.1, 2.2.2.2, and 3.3.3.3
>>>>> 2) client stub establishes connections to 1.1.1.1, 2.2.2.2, and
>>>>> 3.3.3.3, and begins round robining RPCs across them
>>>>> 3) a new host, 4.4.4.4, starts up, and is added behind the
>>>>> headless-test-grpc-master.test-grpc.svc.cluster.local DNS name
>>>>>
>>>>> Then the client will continue to just round robin its RPCs across
>>>>> 1.1.1.1, 2.2.2.2, and 3.3.3.3 indefinitely -- so long as it doesn't
>>>>> encounter a problem with those connections. It will only re-query the DNS,
>>>>> and so learn about 4.4.4.4, if it encounters a problem.
>>>>>
>>>>> There's some possibly interesting discussion about this behavior in
>>>>> https://github.com/grpc/grpc/issues/12295 and in
>>>>> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md
>>>>> .
>>>>>
>>>>> On Thursday, December 3, 2020 at 8:57:03 AM UTC-8 Emmanuel Delmas
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> *Question*
>>>>>> I'm wondering how to refresh the IP list in order to update
>>>>>> subchannel list, after creating gRPC channel in Ruby using DNS resolution
>>>>>> (which created several subchannels).
>>>>>>
>>>>>> *Context*
>>>>>> I've setup gRPC communication between our services in a Kubernetes
>>>>>> environnement two years ago but we are facing issues after pods restart.
>>>>>>
>>>>>> I've setup a Kubernetes headless service (in order to get all pod IPs
>>>>>> from the DNS).
>>>>>> I've managed to use load balancing with the following piece of code.
>>>>>> stub =
>>>>>> ExampleService::Stub.new("headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
>>>>>> :this_channel_is_insecure, timeout: 5, channel_args: 
>>>>>> {'grpc.lb_policy_name'
>>>>>> => 'round_robin'})
>>>>>>
>>>>>> But when I create new pods after the connection or a reconnection,
>>>>>> calls are not load balanced on these new servers.
>>>>>> That why I'm wondering what should I do to make the gRPC resolver
>>>>>> refresh the list of IP and create expected new subchannels.
>>>>>>
>>>>>> Is it something achievable? Which configuration should I use?
>>>>>>
>>>>>> Thanks for your help
>>>>>>
>>>>>> *Emmanuel Delmas*
>>>>>> Backend Developer
>>>>>> CSE Member
>>>>>> https://github.com/papa-cool
>>>>>>
>>>>>>
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "grpc.io" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/grpc-io/j18OMinOAxo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/grpc-io/4e4a7c0e-9cb7-47a8-8392-c0b80a06dba7n%40googlegroups.com
> <https://groups.google.com/d/msgid/grpc-io/4e4a7c0e-9cb7-47a8-8392-c0b80a06dba7n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAJPrZsSs%3DnVV6vPo6tYX6GssE7BENe2%2ByfCYcpuXDAyCLXz-fw%40mail.gmail.com.

Re: [grpc-io] Re: Load balancer and resolver with Ruby

Reply via email to