Keepalive should work here. You'll have to configure a few other params if you have long lived streams with low activity. Along with keepalive_permit_without_calls, you may have to configure max_pings_without_data or min_sent_ping_interval_without_data too. You may also have to configure min_recv_ping_interval_without_data on the server side. See details in this document https://github.com/grpc/grpc/blob/master/doc/keepalive.md
On Monday, August 20, 2018 at 6:23:22 AM UTC-7, [email protected] wrote: > > Hey Srini, > > I've tested pretty aggressive KeepAlive config with the following > parameters: > > 'grpc.http2.min_time_between_pings_ms': 1000, > 'grpc.keepalive_time_ms': 1000, > 'grpc.keepalive_permit_without_calls': 1 > > Is there anything I'm missing? Ideally I would like this solution to > handle both explicit RST and also things like firewalls blackholing > inactive connections (which we've seen happen in the past), so getting > keepalive to detect a dead connection would be great. > > Thanks, > Alysha > > On Friday, August 17, 2018 at 8:17:43 PM UTC-4, Srini Polavarapu wrote: >> >> Hi Alysha, >> >> How did you confirm that client is going into backoff and it is indeed >> receiving a RST when nginx goes away? Have you looked at the logs gRPC >> generates when this happens? One possibility is that nginx doesn't send RST >> and client doesn't know that the connection is broken until TCP timeout >> occurs. Using keepalive will help in this case. >> >> You can try using wait_for_ready=false >> <https://github.com/grpc/grpc/blob/5098508d2d41a116113f7e333c516cd9ef34a943/doc/wait-for-ready.md> >> so >> the call fails immediately and you can retry. >> >> A recent PR allows you to reset the backoff period. >> https://github.com/grpc/grpc/pull/16225. It is experimental and doesn't >> have python or ruby API so it can't be of immediate help. >> >> On Friday, August 17, 2018 at 12:58:12 PM UTC-7, [email protected] >> wrote: >>> >>> Hey Carl, >>> >>> This is with L7 nginx balancing, the reason we moved to nginx from L4 >>> balancers was so we could do per-call balancing (instead of per-connection >>> with L7). >>> >>> > In an ideal world, nginx would send a GOAWAY frame to both the client >>> and the server, and allow all the RPCs to complete before tearing down the >>> connection. >>> >>> I agree a GOAWAY would be better but it seems like nginx doesn't do >>> that (at least yet), they just RST the connection :( >>> >>> > The client knows how to reschedule and unstarted RPC onto a different >>> connection, without returning an UNAVAILABLE. >>> >>> Even when we were using L4 it seemed like a GOAWAY from the Go server >>> would put the Core clients in a backoff state instead of retrying >>> immediately. The only solution that worked was a round-robin over multiple >>> connections and a slow-enough rolling restart so the connections could >>> re-establish before the next one died. >>> >>> > When you say multiple connections to a single IP, does that mean >>> multiple nginx instances listening on different ports? >>> >>> No, it's a pool of ~20 ingress nginx instances with an L4 load balancer, >>> so traffic looks like client -> L4 LB -> nginx L7 -> backend GRPC pod. The >>> problem is the L4 LB in front of nginx has a single public IP. >>> >>> > I'm most familiar with Java, which can actually do what you want. The >>> normal way is the create a custom NameResolver that returns multiple >>> address for a single address, which a RoundRobin load balancer will use >>> >>> Yeah I considered writing something similar in Core but I was worried it >>> wouldn't be adopted upstream because of the move to external LBs? It's very >>> tough (impossible?) to add new resolvers to Ruby or Python without >>> rebuilding the whole extension, and we're pretty worried about maintaining >>> a fork of the C++ implementation. It's nice to hear the approach has some >>> merits, I might experiment with it. >>> >>> Thanks, >>> Alysha >>> >>> On Friday, August 17, 2018 at 3:42:31 PM UTC-4, Carl Mastrangelo wrote: >>>> >>>> Hi Alysha, >>>> >>>> Do you you know if nginx is balancing at L4 or L7? In an ideal >>>> world, nginx would send a GOAWAY frame to both the client and the server, >>>> and allow all the RPCs to complete before tearing down the connection. >>>> The client knows how to reschedule and unstarted RPC onto a different >>>> connection, without returning an UNAVAILABLE. >>>> >>>> When you say multiple connections to a single IP, does that mean >>>> multiple nginx instances listening on different ports? >>>> >>>> I'm most familiar with Java, which can actually do what you want. The >>>> normal way is the create a custom NameResolver that returns multiple >>>> address for a single address, which a RoundRobin load balancer will use. >>>> It sounds like you aren't using Java, but since the implementations are >>>> all >>>> similar there may be a way to do so. >>>> >>>> On Friday, August 17, 2018 at 8:46:49 AM UTC-7, [email protected] >>>> wrote: >>>>> >>>>> Hi grpc people! >>>>> >>>>> We have a setup where we're running a grpc service (written in Go) on >>>>> GKE, and we're accepting traffic from outside the cluster through nginx >>>>> ingresses. Our clients are all using Core GRPC libraries (mostly Ruby) to >>>>> make calls to the nginx ingress, which load-balances per-call to our >>>>> backend pods. >>>>> >>>>> The problem we have with this setup is that whenever the nginx >>>>> ingresses reload they drop all client connections, which results in >>>>> spikes >>>>> of Unavailable errors from our grpc clients. There are many nginx >>>>> ingresses >>>>> but they all have a single IP, the incoming TCP connections are routed >>>>> through a google cloud L4 load balancer. Whenever an nginx . client >>>>> closes >>>>> a TCP connection the GRPC subchannel treats the backend as unavailable, >>>>> even though there are many more nginx pods that may be available >>>>> immediately to serve traffic, and it goes into backoff logic. My >>>>> understanding is that with multiple subchannels even if one nginx ingress >>>>> is restarted the others can continue to serve requests and we shouldn't >>>>> see >>>>> Unavailable errors. >>>>> >>>>> My question is: what is the best way to make GRPC Core establish >>>>> multiple connections to a single IP, so we can have long-lived >>>>> connections >>>>> to multiple nginx ingresses? >>>>> >>>>> Possibilities we've considered: >>>>> >>>>> - DNS round-robin with multiple public IPs on a single A record - >>>>> we've tested this and it works, but it requires us to manually administer >>>>> the DNS records and run multiple L4 LBs >>>>> >>>>> - DNS SRV records - it seems like we could have multiple SRV records >>>>> with the same hostname, but in my testing this requires us to add a >>>>> look-aside load-balancer as well, and enable ares DNS which doesn't seem >>>>> to >>>>> be production-ready >>>>> >>>>> - Host a look-aside load-balancer - we could host our own LB service, >>>>> but it's not clear to me how we would overcome this issue for the LB >>>>> service? The LB would be behind the same nginx ingresses. I haven't found >>>>> great documentation on how to set this up either. >>>>> >>>>> - Connection pooling in the client - wrapping the Ruby GRPC channels >>>>> in a library that explicitly establishes multiple channels, each with one >>>>> sub-channel. I've tried to write this but it's tricky to implement at a >>>>> high level. I couldn't get it to perform as well during failures as the >>>>> DNS >>>>> round-robin approach. >>>>> >>>>> Are there options I missed? Is there any supported pattern for this? >>>>> Has anyone deployed a similar architecture (many clients connecting >>>>> through >>>>> nginx on a single public IP)? >>>>> >>>>> Thanks, >>>>> Alysha >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/0cbd8dc7-99a1-457c-bc59-65d7c465735d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
