Re: [grpc-io] Re: Load balancer and resolver with Ruby

Chen Song Thu, 02 Sep 2021 12:43:28 -0700

Hi Emmanuel
I have some follow-up questions.
> I set max_connection_age on the server-side and it works well. Nothing 
else to do on the client side. When max_connection_age is reached, a GOAWAY 
signal is sent to the client. Each time a client receives a GOAWAY signal, 
it automatically refreshes its DNS and creates connections for new services 
and the one that has been closed.
May I ask how you monitor this? Did you verify this on the client side with 
gRPC debug level logging? Or did you have your client program to send gRPC 
requests continuously to verify this on the server side?
Also, what happens if the GOAWAY signal is received during an in-flight 
request (e.g., a long-lived read)? Did the read fails, or it completes as 
long as max_connection_grace_age is long enough?


Best,
Chen

On Wednesday, September 1, 2021 at 8:18:39 AM UTC-4 Emmanuel DELMAS wrote:

> Hi Chen
>
>  

> > Given that client is doing client-side LB with round_robin, is setting 
> max_connection_age on the server-side the right way to solve this problem? 
> Will clients be able to refresh and reconnect automatically, or do we need 
> to recreate the client (the underlying channel) periodically?
> I set max_connection_age on the server-side and it works well. Nothing 
> else to do on the client side. When max_connection_age is reached, a GOAWAY 
> signal is sent to the client. Each time a client receives a GOAWAY signal, 
> it automatically refreshes its DNS and creates connections for new services 
> and the one that has been closed.
>
 

>
> > Also, the GOAWAY signal is random. Do client implementation need to 
> handle this in particular?
> What do you mean exactly? I'm not sure to be able to answer this point.
>
> Regards
>
> *Emmanuel Delmas* 
> Backend Developer
> CSE Member
>
> *LinkedIn <https://www.linkedin.com/in/emmanueldelmasisep/>.____*
>
>
> *19 rue Blanche, 75009 Paris, France*
>
>
> Le mer. 1 sept. 2021 à 01:43, Chen Song <[email protected]> a écrit :
>
>> I want to follow up on this thread, as we have similar requirements 
>> (force clients to refresh server addresses from dns resolver as new pods 
>> will be launched on K8s) but client is in Python.
>>
>> Given that client is doing client-side LB with round_robin, is setting 
>> max_connection_age on the server-side the right way to solve this problem? 
>> Will clients be able to refresh and reconnect automatically, or do we need 
>> to recreate the client (the underlying channel) periodically?
>> Also, the GOAWAY signal is random. Do client implementation need to 
>> handle this in particular?
>>
>> Chen
>> On Wednesday, December 23, 2020 at 4:50:31 AM UTC-5 Emmanuel Delmas wrote:
>>
>>> > Just curious, how has this been determined that the GOAWAY frame 
>>> wasn't received? Also what are your values of MAX_CONNECTION_AGE and 
>>> MAX_CONNECTION_AGE_GRACE ?
>>>
>>> MAX_CONNECTION_AGE and MAX_CONNECTION_AGE_GRACE was infinite but I 
>>> changed this week MAX_CONNECTION_AGE to 5 minutes.
>>>
>>> I followed this documentation to display gRPC logs and the see GOAWAY 
>>> signal.
>>> https://github.com/grpc/grpc/blob/v1.25.x/TROUBLESHOOTING.md
>>> https://github.com/grpc/grpc/blob/master/doc/environment_variables.md
>>> To reproduce the error, I setup a channel without round robin load 
>>> balancing (only one subchannel).
>>> ExampleService::Stub.new("headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
>>>  
>>> :this_channel_is_insecure, timeout: 5)
>>> Then I recursively kill the server pod connected to my client. When I 
>>> see in the logs that GOAWAY signal is received, a reconnection occurs 
>>> without any error in my requests. But when the reception of the GOAWAY 
>>> signal is not logged, no reconnection occurs and I receive a bunch of 
>>> DeadlineExceeded errors during several minutes.
>>> The error still occur even if I create a new channel. However, if a 
>>> recreate the channel adding "dns:" at the beginning of the host, it works.
>>> ExampleService::Stub.new("dns:headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
>>>  
>>> :this_channel_is_insecure, timeout: 5)
>>> The opposite if true. If I create the channel with "dns:" at the 
>>> beginning of the host, it can lead to the same failure and I will be able 
>>> to create a working channel removing the "dns:" at the beginning of the 
>>> host.
>>>
>>>
>>> *Did you already heard this kind of issue? Is there some cache in the 
>>> dns resolver?*
>>>
>>> > A guess: one possible thing to look for is if IP packets to/from the 
>>> pod's address stopped forwarding, rendering the TCP connection to it a 
>>> "black hole". In that case, a grpc client will, by default, realize that a 
>>> connection is bad only after the TCP connection times out (typically ~15 
>>> minutes). You may set keepalive parameters to notice the brokenness of such 
>>> connections faster -- see references to keepalive in 
>>> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md 
>>> for more details.
>>>
>>> Yes. It is like requests go to a black hole. And has you said, it is 
>>> naturally fixed by itself after around 15 minutes. I will add a client side 
>>> keep alive to make it shorter. But even with 1 minute instead of 15, I need 
>>> to find another workaround in order to avoid degraded services for my 
>>> customer.
>>>
>>> Thank you.
>>>
>>> Le mardi 22 décembre 2020 à 21:34:32 UTC+1, [email protected] a écrit :
>>>
>>>> > It happens that sometimes, the GOAWAY signal isn't received by the 
>>>> client.
>>>>
>>>> Just curious, how has this been determined that the GOAWAY frame wasn't 
>>>> received? Also what are your values of MAX_CONNECTION_AGE and 
>>>> MAX_CONNECTION_AGE_GRACE ?
>>>>
>>>> A guess: one possible thing to look for is if IP packets to/from the 
>>>> pod's address stopped forwarding, rendering the TCP connection to it a 
>>>> "black hole". In that case, a grpc client will, by default, realize that a 
>>>> connection is bad only after the TCP connection times out (typically ~15 
>>>> minutes). You may set keepalive parameters to notice the brokenness of 
>>>> such 
>>>> connections faster -- see references to keepalive in 
>>>> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md 
>>>> for more details.
>>>>
>>>>
>>>>
>>>> On Tuesday, December 22, 2020 at 11:30:44 AM UTC-8 Emmanuel Delmas 
>>>> wrote:
>>>>
>>>>> Thank you. I've setup MAX_CONNECTION_AGE and it seems to work well.
>>>>>
>>>>> I was looking for a way to refresh the name resolution because I'm 
>>>>> facing another issue.
>>>>> It happens that sometimes, the GOAWAY signal isn't received by the 
>>>>> client.
>>>>> In this case, I receive a bunch of DeadlineExceeded errors, the client 
>>>>> still sending message to a deleted Kubernetes pod.
>>>>> I wanted to trigger a refresh at this time but I understand it is not 
>>>>> possible.
>>>>>
>>>>> Do you already get this kind of issue?
>>>>> Do you have any advice to handle a not received GOAWAY signal?
>>>>>
>>>>> Le lundi 21 décembre 2020 à 19:42:17 UTC+1, [email protected] a 
>>>>> écrit :
>>>>>
>>>>>> > "But when I create new pods after the connection or a reconnection, 
>>>>>> calls are not load balanced on these new servers."
>>>>>>
>>>>>> Can you elaborate a bit on what exactly is done here and the expected 
>>>>>> behavior?
>>>>>>
>>>>>> In general, one thing to note about gRPC's client channel/stub is 
>>>>>> that in general a client will not refresh the name resolution process 
>>>>>> unless it encounters a problem with the current connection(s) that it 
>>>>>> has. 
>>>>>> So for example if the following events happen:
>>>>>> 1) client stub resolves 
>>>>>> headless-test-grpc-master.test-grpc.svc.cluster.local in DNS, to 
>>>>>> addresses 
>>>>>> 1.1.1.1, 2.2.2.2, and 3.3.3.3
>>>>>> 2) client stub establishes connections to 1.1.1.1, 2.2.2.2, and 
>>>>>> 3.3.3.3, and begins round robining RPCs across them
>>>>>> 3) a new host, 4.4.4.4, starts up, and is added behind the 
>>>>>> headless-test-grpc-master.test-grpc.svc.cluster.local DNS name
>>>>>>
>>>>>> Then the client will continue to just round robin its RPCs across 
>>>>>> 1.1.1.1, 2.2.2.2, and 3.3.3.3 indefinitely -- so long as it doesn't 
>>>>>> encounter a problem with those connections. It will only re-query the 
>>>>>> DNS, 
>>>>>> and so learn about 4.4.4.4, if it encounters a problem.
>>>>>>
>>>>>> There's some possibly interesting discussion about this behavior in 
>>>>>> https://github.com/grpc/grpc/issues/12295 and in 
>>>>>> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md
>>>>>> .
>>>>>>
>>>>>> On Thursday, December 3, 2020 at 8:57:03 AM UTC-8 Emmanuel Delmas 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>
>>>>>>> *Question*
>>>>>>> I'm wondering how to refresh the IP list in order to update 
>>>>>>> subchannel list, after creating gRPC channel in Ruby using DNS 
>>>>>>> resolution 
>>>>>>> (which created several subchannels).
>>>>>>>
>>>>>>> *Context*
>>>>>>> I've setup gRPC communication between our services in a Kubernetes 
>>>>>>> environnement two years ago but we are facing issues after pods restart.
>>>>>>>
>>>>>>> I've setup a Kubernetes headless service (in order to get all pod 
>>>>>>> IPs from the DNS).
>>>>>>> I've managed to use load balancing with the following piece of code.
>>>>>>> stub = 
>>>>>>> ExampleService::Stub.new("headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
>>>>>>>  
>>>>>>> :this_channel_is_insecure, timeout: 5, channel_args: 
>>>>>>> {'grpc.lb_policy_name' 
>>>>>>> => 'round_robin'})
>>>>>>>
>>>>>>> But when I create new pods after the connection or a reconnection, 
>>>>>>> calls are not load balanced on these new servers.
>>>>>>> That why I'm wondering what should I do to make the gRPC resolver 
>>>>>>> refresh the list of IP and create expected new subchannels.
>>>>>>>
>>>>>>> Is it something achievable? Which configuration should I use?
>>>>>>>
>>>>>>> Thanks for your help
>>>>>>>
>>>>>>> *Emmanuel Delmas* 
>>>>>>> Backend Developer
>>>>>>> CSE Member
>>>>>>> https://github.com/papa-cool
>>>>>>>
>>>>>>>  
>>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "grpc.io" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/grpc-io/j18OMinOAxo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/grpc-io/4e4a7c0e-9cb7-47a8-8392-c0b80a06dba7n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/grpc-io/4e4a7c0e-9cb7-47a8-8392-c0b80a06dba7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/8ff8aede-70ee-4103-9e26-bde5d7cd9a62n%40googlegroups.com.

Re: [grpc-io] Re: Load balancer and resolver with Ruby

Reply via email to