Do your clients all start up at the same time? If not, it's not clear to me why your setup wouldn't work. If the clients' start times are randomly distributed, then if the server closes each connection after 30m, the connection close times should be just as randomly distributed as the client start times, which means that as soon as the new server comes up, clients should start trickling into it. It may take 30m for the load to fully balance, but the new server should start getting new load immediately, and the load should increase slowly over that 30m period.
I don't know anything about the Kubernetes autoscaling side of things, but maybe there are parameters you can tune there to give the new server more time to accumulate load before Kubernetes kills it? In general, it's not clear to me that there's a better approach than the one you're already taking. There's always a bit of tension between load balancing and streaming RPCs, because the whole point of a streaming RPC is that it doesn't go through load balancing for each individual message, which means that all of the messages go to the same backend. I hope this information is helpful. On Mon, Oct 7, 2019 at 3:54 PM howardjohn via grpc.io < grpc-io@googlegroups.com> wrote: > We have a case where we have many clients and few servers, typically > 1000:1 ratio. The traffic is a single bidirectional stream per client. > > The problem we are seeing is that when a new server comes up, it will have > no clients connected, as they maintain their connection to the other > servers. > > This is made worse by Kubernetes autoscaling, as this new server will have > 0 load it will scale down and we flip flop between n and n+1 replicas. This > graph shows this behavior pretty well: > https://snapshot.raintank.io/dashboard/snapshot/SceOCrNpdOr4qmTUk1UHF20xMiNqGk6K?panelId=4&fullscreen&orgId=2 > > As a mitigation against this, we have the server close the connections > every 30m. This is not great, because it takes a least 30 min to balance, > and due to the above issue this generally doesn't ever work. > > > I am wondering if there are any best practices for handling this type of > problem? > > One possible idea we have is the server sharing load information and > shedding load if they have more than their "fair share" of connections, but > this is pretty complex > > -- > You received this message because you are subscribed to the Google Groups " > grpc.io" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to grpc-io+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/3423f295-a2fe-4096-b003-fc09c605987e%40googlegroups.com > <https://groups.google.com/d/msgid/grpc-io/3423f295-a2fe-4096-b003-fc09c605987e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- Mark D. Roth <r...@google.com> Software Engineer Google, Inc. -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CAJgPXp4-VXZ99TsjnR2XgbsaJTn_-8qgFgkozfeG_J%2BU7jQB5A%40mail.gmail.com.