I'm very happy to get so much feedback! To Carl,
We don't have a container cluster management system like kubernetes or mesos running, so to spin up an additional instance would mean starting up a new machine and that will add significant latency to our deploys. We use RR. In RR does the resolver server list refresh only happen when all server connections are down, or each time a single server go down? To Kun, Thanks for filing the issue. We are planning to use grpc in Python, Go, and Java. We implemented our own Go dns resolver that does support periodic updates since the Go library seems to be missing dns resolvers entirely. Do you know what the core implementation which Python uses does? Does the periodic refresh or TTL need to be implemented in core as well? Support rolling deploys and/or adding servers to a name and having clients pick them up periodically seems like a pretty critical feature for a production system. Is there design for these cases documented somewhere? To Luke and Kun, I agree TTL seems like the most natural option since it's part of DNS. I think I saw an issue in core somewhere that mentioned DNS TTL but can't seem to dig it up. Thanks! Yunchi On Tue, Dec 13, 2016 at 8:30 PM Luke Tyler Downey <[email protected]> wrote: > Even better (for our use case in particular, but in general too probably) > would be to actually respect the TTL in a DNS record. > > Luke > > > On Dec 13, 2016, at 7:28 PM, Kun Zhang <[email protected]> wrote: > > The generalized version of the issue would be that you add a server but > the client will never pick it up until one existing server is down. > We could add periodical refreshing functionality to our DNS NameResolver. > I have filed https://github.com/grpc/grpc-java/issues/2514 > > > On Tuesday, December 13, 2016 at 4:08:31 PM UTC-8, Carl Mastrangelo wrote: > > A couple things: > > > * Is there anyway to add a new server to the pool before turning down the > previous replica? That would make you slightly over capacity during the > roll out. > * The load balancer in Java works differently based on the strategy you > use. In RR, the LB maintains a connection to each of the replicas, so that > when one goes down the next one will be used. This means that as long as > you have some in the pool, it will always go to the next connection. That > said, if you are doing rolling restart, this pool will get smaller and > smaller until every connection has been killed and marked unusuable. At > that point I believe it will refresh the list. > > On Friday, December 9, 2016 at 2:45:57 PM UTC-8, [email protected] wrote: > > I'm a bit unclear as to how name resolvers in GRPC work with load > balancing in cases of rolling deploys. As far as I can tell, it seems like > the most recently deployed server will end up no traffic if we follow our > current deploy strategy. > > Our rolling deploy strategy that we plan to adapt for GRPC works as follows > > 1. Build artifacts > 2. De-register a server replica from its DNS name > 3. Update the server and restart it > 4. Re-register the server replica from its DNS name. > > For the Java GRPC implementation, it looks like the GRPC name resolver does > not refresh the list of IPs unless > <https://github.com/grpc/grpc-java/blob/e9779d7c00cde14b33ba3239c859f997e70c2b2e/core/src/main/java/io/grpc/internal/ManagedChannelImpl.java#L696> > (1) there is an error in a previous resolve or (2) a server goes down. I > believe the core implementation does the same thing, though I'm not > familiar enough with C to really tell. > > What I believe will happen during a rolling deploy is: > > 1. Before deploy: Client is talking to N nodes > 2. A server is removed from DNS, nothing happens on the client > 3. The server issues a GOAWAY frame to clients. The client removes the > server from its list of connections, and resolves a new list of servers, > finding any newly added servers > 4. The server is restarted and added to the DNS > 5. Repeat for all other servers in the server set > 5. After deploy: Client is talking to N-1 nodes and will never attempt to > look for the last server to be restarted > > Is my analysis correct? And if so, what is the recommended way to make > sure the client ends up talking to all N servers after a rolling deploy? > > -- > You received this message because you are subscribed to a topic in the > Google Groups "grpc.io" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/grpc-io/wxgLgjzkR30/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/grpc-io. > To view this discussion on the web visit > https://groups.google.com/d/msgid/grpc-io/B2119977-C217-4B0A-94A9-ED616CD9DD47%40compass.com > <https://groups.google.com/d/msgid/grpc-io/B2119977-C217-4B0A-94A9-ED616CD9DD47%40compass.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/grpc-io. To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/CADggu4dEyRYXbXewJbw9v7whE15359oKxZ0vqNTohCt-FQgmQQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
