> Just curious, how has this been determined that the GOAWAY frame wasn't 
received? Also what are your values of MAX_CONNECTION_AGE and 
MAX_CONNECTION_AGE_GRACE ?

MAX_CONNECTION_AGE and MAX_CONNECTION_AGE_GRACE was infinite but I changed 
this week MAX_CONNECTION_AGE to 5 minutes.

I followed this documentation to display gRPC logs and the see GOAWAY 
signal.
https://github.com/grpc/grpc/blob/v1.25.x/TROUBLESHOOTING.md
https://github.com/grpc/grpc/blob/master/doc/environment_variables.md
To reproduce the error, I setup a channel without round robin load 
balancing (only one subchannel).
ExampleService::Stub.new("headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
 
:this_channel_is_insecure, timeout: 5)
Then I recursively kill the server pod connected to my client. When I see 
in the logs that GOAWAY signal is received, a reconnection occurs without 
any error in my requests. But when the reception of the GOAWAY signal is 
not logged, no reconnection occurs and I receive a bunch of 
DeadlineExceeded errors during several minutes.
The error still occur even if I create a new channel. However, if a 
recreate the channel adding "dns:" at the beginning of the host, it works.
ExampleService::Stub.new("dns:headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
 
:this_channel_is_insecure, timeout: 5)
The opposite if true. If I create the channel with "dns:" at the beginning 
of the host, it can lead to the same failure and I will be able to create a 
working channel removing the "dns:" at the beginning of the host.


*Did you already heard this kind of issue? Is there some cache in the dns 
resolver?*

> A guess: one possible thing to look for is if IP packets to/from the 
pod's address stopped forwarding, rendering the TCP connection to it a 
"black hole". In that case, a grpc client will, by default, realize that a 
connection is bad only after the TCP connection times out (typically ~15 
minutes). You may set keepalive parameters to notice the brokenness of such 
connections faster -- see references to keepalive in 
https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md for 
more details.

Yes. It is like requests go to a black hole. And has you said, it is 
naturally fixed by itself after around 15 minutes. I will add a client side 
keep alive to make it shorter. But even with 1 minute instead of 15, I need 
to find another workaround in order to avoid degraded services for my 
customer.

Thank you.

Le mardi 22 décembre 2020 à 21:34:32 UTC+1, [email protected] a écrit :

> > It happens that sometimes, the GOAWAY signal isn't received by the 
> client.
>
> Just curious, how has this been determined that the GOAWAY frame wasn't 
> received? Also what are your values of MAX_CONNECTION_AGE and 
> MAX_CONNECTION_AGE_GRACE ?
>
> A guess: one possible thing to look for is if IP packets to/from the pod's 
> address stopped forwarding, rendering the TCP connection to it a "black 
> hole". In that case, a grpc client will, by default, realize that a 
> connection is bad only after the TCP connection times out (typically ~15 
> minutes). You may set keepalive parameters to notice the brokenness of such 
> connections faster -- see references to keepalive in 
> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md 
> for more details.
>
>
>
> On Tuesday, December 22, 2020 at 11:30:44 AM UTC-8 Emmanuel Delmas wrote:
>
>> Thank you. I've setup MAX_CONNECTION_AGE and it seems to work well.
>>
>> I was looking for a way to refresh the name resolution because I'm facing 
>> another issue.
>> It happens that sometimes, the GOAWAY signal isn't received by the client.
>> In this case, I receive a bunch of DeadlineExceeded errors, the client 
>> still sending message to a deleted Kubernetes pod.
>> I wanted to trigger a refresh at this time but I understand it is not 
>> possible.
>>
>> Do you already get this kind of issue?
>> Do you have any advice to handle a not received GOAWAY signal?
>>
>> Le lundi 21 décembre 2020 à 19:42:17 UTC+1, [email protected] a écrit :
>>
>>> > "But when I create new pods after the connection or a reconnection, 
>>> calls are not load balanced on these new servers."
>>>
>>> Can you elaborate a bit on what exactly is done here and the expected 
>>> behavior?
>>>
>>> In general, one thing to note about gRPC's client channel/stub is that 
>>> in general a client will not refresh the name resolution process unless it 
>>> encounters a problem with the current connection(s) that it has. So for 
>>> example if the following events happen:
>>> 1) client stub resolves 
>>> headless-test-grpc-master.test-grpc.svc.cluster.local in DNS, to addresses 
>>> 1.1.1.1, 2.2.2.2, and 3.3.3.3
>>> 2) client stub establishes connections to 1.1.1.1, 2.2.2.2, and 3.3.3.3, 
>>> and begins round robining RPCs across them
>>> 3) a new host, 4.4.4.4, starts up, and is added behind the 
>>> headless-test-grpc-master.test-grpc.svc.cluster.local DNS name
>>>
>>> Then the client will continue to just round robin its RPCs across 
>>> 1.1.1.1, 2.2.2.2, and 3.3.3.3 indefinitely -- so long as it doesn't 
>>> encounter a problem with those connections. It will only re-query the DNS, 
>>> and so learn about 4.4.4.4, if it encounters a problem.
>>>
>>> There's some possibly interesting discussion about this behavior in 
>>> https://github.com/grpc/grpc/issues/12295 and in 
>>> https://github.com/grpc/proposal/blob/master/A9-server-side-conn-mgt.md.
>>>
>>> On Thursday, December 3, 2020 at 8:57:03 AM UTC-8 Emmanuel Delmas wrote:
>>>
>>>> Hi
>>>>
>>>> *Question*
>>>> I'm wondering how to refresh the IP list in order to update subchannel 
>>>> list, after creating gRPC channel in Ruby using DNS resolution (which 
>>>> created several subchannels).
>>>>
>>>> *Context*
>>>> I've setup gRPC communication between our services in a Kubernetes 
>>>> environnement two years ago but we are facing issues after pods restart.
>>>>
>>>> I've setup a Kubernetes headless service (in order to get all pod IPs 
>>>> from the DNS).
>>>> I've managed to use load balancing with the following piece of code.
>>>> stub = 
>>>> ExampleService::Stub.new("headless-test-grpc-master.test-grpc.svc.cluster.local:50051",
>>>>  
>>>> :this_channel_is_insecure, timeout: 5, channel_args: 
>>>> {'grpc.lb_policy_name' 
>>>> => 'round_robin'})
>>>>
>>>> But when I create new pods after the connection or a reconnection, 
>>>> calls are not load balanced on these new servers.
>>>> That why I'm wondering what should I do to make the gRPC resolver 
>>>> refresh the list of IP and create expected new subchannels.
>>>>
>>>> Is it something achievable? Which configuration should I use?
>>>>
>>>> Thanks for your help
>>>>
>>>> *Emmanuel Delmas* 
>>>> Backend Developer
>>>> CSE Member
>>>> https://github.com/papa-cool
>>>>
>>>>  

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/2aa232ce-e698-45db-b53c-5b38c2e82ef1n%40googlegroups.com.

Reply via email to