Hey grpc users,

I recently migrated a streaming service to GRPC and have a couple questions.

I have a small group of servers (8) discoverable via DNS serving a lot of 
clients (100k+) with a streaming API. I control the client code (fat client 
library) but not their lifecycle (restarting). I want to keep the number of 
clients roughly balanced between servers while maintaining only one 
connection to the server per client. This means I can't use the roundrobin 
load balancer (or any custom Pickers) since that uses one SubCon per 
address which makes TCP connections to all 8 servers at once. 

So far, I've implemented a rendezvous algorithm in a custom DNS resolver 
and used the pickfirst lb strategy, which connects to only one peer at a 
time ordered by priority. However, I ran into two issues:
1. If I restart one of the servers, that server will never get any clients 
unless the clients also restart. Before grpc, we had a stream timeout that 
also timed out the connection, so after a certain interval, the client will 
kill the connection and re-establish a new connection in priority order. 
This allows clients who was kicked off of their preferred peer to 
re-connect back after a certain interval. This behavior no longer works 
when I migrated to grpc because killing the stream doesn't terminate the 
underlying connection. Is there any way to do this with the grpc go library?
2. Similarly, when our server comes online, there's a warm up period where 
it's loading data, so the stream handler returns an UNAVAILABLE error if 
the server is not ready, but this doesn't kill the underlying grpc 
connection so the client will retry connecting to the same server over and 
over again until the server is warmed up. Is there anyway to consider a 
connection failed based on what error is returned by a stream so the 
ClientConn/SubConn it tries a different peer?

I'm trying to avoid re-implementing the low level dialing, backoff, and 
reconnecting logic provided by a grpc SubConn & ClientConn. But reading 
through the code, it doesn't look like I have much control over the life 
cycle of the underlying connection, so it seems the only way to do this is 
to grpc.Dial servers individually and manage the dns peer list + redials 
outside of the load balancer API. Is that right?

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/326bb103-f0f0-4f03-bf83-cf12677e6793%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to