[grpc-io] Re: Establishing multiple grpc subchannels for a single resolved host

'Srini Polavarapu' via grpc.io Mon, 20 Aug 2018 14:38:59 -0700

Keepalive should work here. You'll have to configure a few other params if 
you have long lived streams with low activity. Along with 
keepalive_permit_without_calls, you may have to configure 
max_pings_without_data or min_sent_ping_interval_without_data too. You may 
also have to configure min_recv_ping_interval_without_data on the server 
side. See details in this 
document https://github.com/grpc/grpc/blob/master/doc/keepalive.md


On Monday, August 20, 2018 at 6:23:22 AM UTC-7, [email protected] 
wrote:
>
> Hey Srini,
>
> I've tested pretty aggressive KeepAlive config with the following 
> parameters:
>
> 'grpc.http2.min_time_between_pings_ms': 1000,
> 'grpc.keepalive_time_ms': 1000,
> 'grpc.keepalive_permit_without_calls': 1
>
> Is there anything I'm missing? Ideally I would like this solution to 
> handle both explicit RST and also things like firewalls blackholing 
> inactive connections (which we've seen happen in the past), so getting 
> keepalive to detect a dead connection would be great.
>
> Thanks,
> Alysha
>
> On Friday, August 17, 2018 at 8:17:43 PM UTC-4, Srini Polavarapu wrote:
>>
>> Hi Alysha,
>>
>> How did you confirm that client is going into backoff and it is indeed 
>> receiving a RST when nginx goes away? Have you looked at the logs gRPC 
>> generates when this happens? One possibility is that nginx doesn't send RST 
>> and client doesn't know that the connection is broken until TCP timeout 
>> occurs. Using keepalive will help in this case.
>>
>> You can try using wait_for_ready=false 
>> <https://github.com/grpc/grpc/blob/5098508d2d41a116113f7e333c516cd9ef34a943/doc/wait-for-ready.md>
>>  so 
>> the call fails immediately and you can retry.
>>
>> A recent PR allows you to reset the backoff period. 
>> https://github.com/grpc/grpc/pull/16225. It is experimental and doesn't 
>> have python or ruby API so it can't be of immediate help.
>>
>> On Friday, August 17, 2018 at 12:58:12 PM UTC-7, [email protected] 
>> wrote:
>>>
>>> Hey Carl,
>>>
>>> This is with L7 nginx balancing, the reason we moved to nginx from L4 
>>> balancers was so we could do per-call balancing (instead of per-connection 
>>> with L7).
>>>
>>> >  In an ideal world, nginx would send a GOAWAY frame to both the client 
>>> and the server, and allow all the RPCs to complete before tearing down the 
>>> connection.
>>>
>>>  I agree a GOAWAY would be better but it seems like nginx doesn't do 
>>> that (at least yet), they just RST the connection :(
>>>
>>> > The client knows how to reschedule and unstarted RPC onto a different 
>>> connection, without returning an UNAVAILABLE.  
>>>
>>> Even when we were using L4 it seemed like a GOAWAY from the Go server 
>>> would put the Core clients in a backoff state instead of retrying 
>>> immediately. The only solution that worked was a round-robin over multiple 
>>> connections and a slow-enough rolling restart so the connections could 
>>> re-establish before the next one died.
>>>
>>> > When you say multiple connections to a single IP, does that mean 
>>> multiple nginx instances listening on different ports?
>>>
>>> No, it's a pool of ~20 ingress nginx instances with an L4 load balancer, 
>>> so traffic looks like client -> L4 LB -> nginx L7 -> backend GRPC pod. The 
>>> problem is the L4 LB in front of nginx has a single public IP.
>>>
>>> > I'm most familiar with Java, which can actually do what you want.  The 
>>> normal way is the create a custom NameResolver that returns multiple 
>>> address for a single address, which a RoundRobin load balancer will use
>>>
>>> Yeah I considered writing something similar in Core but I was worried it 
>>> wouldn't be adopted upstream because of the move to external LBs? It's very 
>>> tough (impossible?) to add new resolvers to Ruby or Python without 
>>> rebuilding the whole extension, and we're pretty worried about maintaining 
>>> a fork of the C++ implementation. It's nice to hear the approach has some 
>>> merits, I might experiment with it.
>>>
>>> Thanks,
>>> Alysha
>>>
>>> On Friday, August 17, 2018 at 3:42:31 PM UTC-4, Carl Mastrangelo wrote:
>>>>
>>>> Hi Alysha,
>>>>
>>>> Do you you know if nginx is balancing at L4 or L7?    In an ideal 
>>>> world, nginx would send a GOAWAY frame to both the client and the server, 
>>>> and allow all the RPCs to complete before tearing down the connection.  
>>>>  The client knows how to reschedule and unstarted RPC onto a different 
>>>> connection, without returning an UNAVAILABLE.  
>>>>
>>>> When you say multiple connections to a single IP, does that mean 
>>>> multiple nginx instances listening on different ports?    
>>>>
>>>> I'm most familiar with Java, which can actually do what you want.  The 
>>>> normal way is the create a custom NameResolver that returns multiple 
>>>> address for a single address, which a RoundRobin load balancer will use.  
>>>> It sounds like you aren't using Java, but since the implementations are 
>>>> all 
>>>> similar there may be a way to do so.  
>>>>
>>>> On Friday, August 17, 2018 at 8:46:49 AM UTC-7, [email protected] 
>>>> wrote:
>>>>>
>>>>> Hi grpc people!
>>>>>
>>>>> We have a setup where we're running a grpc service (written in Go) on 
>>>>> GKE, and we're accepting traffic from outside the cluster through nginx 
>>>>> ingresses. Our clients are all using Core GRPC libraries (mostly Ruby) to 
>>>>> make calls to the nginx ingress, which load-balances per-call to our 
>>>>> backend pods.
>>>>>
>>>>> The problem we have with this setup is that whenever the nginx 
>>>>> ingresses reload they drop all client connections, which results in 
>>>>> spikes 
>>>>> of Unavailable errors from our grpc clients. There are many nginx 
>>>>> ingresses 
>>>>> but they all have a single IP, the incoming TCP connections are routed 
>>>>> through a google cloud L4 load balancer. Whenever an nginx . client 
>>>>> closes 
>>>>> a TCP connection the GRPC subchannel treats the backend as unavailable, 
>>>>> even though there are many more nginx pods that may be available 
>>>>> immediately to serve traffic, and it goes into backoff logic. My 
>>>>> understanding is that with multiple subchannels even if one nginx ingress 
>>>>> is restarted the others can continue to serve requests and we shouldn't 
>>>>> see 
>>>>> Unavailable errors.
>>>>>
>>>>> My question is: what is the best way to make GRPC Core establish 
>>>>> multiple connections to a single IP, so we can have long-lived 
>>>>> connections 
>>>>> to multiple nginx ingresses? 
>>>>>
>>>>> Possibilities we've considered:
>>>>>
>>>>> - DNS round-robin with multiple public IPs on a single A record - 
>>>>> we've tested this and it works, but it requires us to manually administer 
>>>>> the DNS records and run multiple L4 LBs
>>>>>
>>>>> - DNS SRV records - it seems like we could have multiple SRV records 
>>>>> with the same hostname, but in my testing this requires us to add a 
>>>>> look-aside load-balancer as well, and enable ares DNS which doesn't seem 
>>>>> to 
>>>>> be production-ready
>>>>>
>>>>> - Host a look-aside load-balancer - we could host our own LB service, 
>>>>> but it's not clear to me how we would overcome this issue for the LB 
>>>>> service? The LB would be behind the same nginx ingresses. I haven't found 
>>>>> great documentation on how to set this up either.
>>>>>
>>>>> - Connection pooling in the client - wrapping the Ruby GRPC channels 
>>>>> in a library that explicitly establishes multiple channels, each with one 
>>>>> sub-channel. I've tried to write this but it's tricky to implement at a 
>>>>> high level. I couldn't get it to perform as well during failures as the 
>>>>> DNS 
>>>>> round-robin approach.
>>>>>
>>>>> Are there options I missed? Is there any supported pattern for this? 
>>>>> Has anyone deployed a similar architecture (many clients connecting 
>>>>> through 
>>>>> nginx on a single public IP)?
>>>>>
>>>>> Thanks,
>>>>> Alysha
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/0cbd8dc7-99a1-457c-bc59-65d7c465735d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[grpc-io] Re: Establishing multiple grpc subchannels for a single resolved host

Reply via email to