On Tue, Jan 24, 2017 at 1:37 PM, Mark D. Roth <[email protected]> wrote:
>
> I started a similar discussion on canarying ConfigMaps under Kubernetes.
>> While this design is the way at least one push mechanism inside Google
>> works, there's also a more predictable, push-like one (GDP's), where you
>> pick N candidates, tell them to get a new config and watch them for T
>> minutes, making sure they don't die. That, of course, assumes you keep
>> track of the clients, through e.g. grpclb, and can also track their health
>> (Borg and Kubernetes both do). Would you consider that for future
>> developments?
>>
>
> The problem is that we have no mechanism for tracking clients that way.
> Tracking them via grpclb won't work, because clients should be able to use
> the service config without using grpclb.  And more generally, any tracking
> mechanism like this would require a lot of communication between servers
> and clients, which would add a lot of complexity.  I don't think that the
> advantage of getting slightly more accurate canary percentages is really
> worth that complexity.
>


Sorry, I don't know what I was thinking when I mentioned grpclb. And I
should have explained things further.

For additional background, I also created an issue for TXT record support
in Kubernetes' kube-dns, citing gRPC as an use case:
https://github.com/kubernetes/dns/issues/38

My idea is that there would be a 1:1 mapping between a Kubernetes service
and a gRPC service, something that developers seem to like. For example,
you'd point clients to mysvc.myns.svc.mycluster.example.com (or a shorter
relative name). That's a DNS hostname managed by kube-dns. With the new
support, it would return the appropriate TXT record. More in detail:

There's a controller (daemon) in the Kubernetes cluster in charge of
pushing gRPC config changes. It sends an update request on the mysvc
service object. kube-dns, also running in the cluster, sees the object
change and starts serving new TXT records. So far, gRPC itself doesn't need
to be involved, i.e. there's no special protocol between servers and
clients (or grpclb). Clients keep using DNS.

With your current design, the controller pushes a config with e.g. a 1%
canary. Accurate percentages is not even the main issue; determinism and
proper health check tracking are. Assuming that exactly 1% of clients pick
up the change, now you have the issue of figuring if a replica that just
crashed did it because of the new config or for unrelated reasons.
Engineers really hate it when a configuration change gets rolled back
automatically because of a random external event. Conversely, when an
unhealthy service is seeing an elevated crash rate and you try to push a
new config to stop the bleeding, it's very valuable to know if the new
settings are making instances stable again.

In both cases, you need to track which configuration each client is using.
You could do this through e.g. monitoring (each client reports the current
config version) and correlate that with health checks.

Or, more simply, you could have the controller pick up N victims, push a
config with a field along the lines of

`"hosts": "host1:port,host2:port,..."`

wait n=TTL seconds and then keep tracking the health status for those
clients. This is all done in the controller, which is responsible for
discovery, tracking clients through the proper APIs. The example above
involves Kubernetes, but, in general, the same mechanism applies to every
other environment. The only client change is for the grpc client to match
its own host:port against "hosts". A proper config should have either
"percentage" or "hosts", not both. Or maybe the latter always wins. The
idea is that you canary with a small number of clients, then switch to
percentages, so that the config doesn't get bloated with tens or hundreds
of hostnames.

This approach has also the advantage that, if you have N groups of clients
(e.g. three different frontend services that talk to a shared database),
you can treat them and push to each of them independently. "clientLanguage"
might be too coarse, especially when all your clients are in the same
language. :-)



> I'd prefer to avoid that if possible, because I think it's valuable for
> debugging purposes to have the TXT record in human-readable form.  However,
> we can certainly add support for something like this if/when people start
> running into the length limitation.
>

I agree. It's really ugly and a true last resort to be implemented only
when the actual need arises.


-- 
Rudi Chiarito — Infrastructure — Clarifai, Inc.
"Trust me, I know what I'm doing." (Sledge Hammer!)

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAPzY349gnGQzxrqGXOaNUhpZTqcoW9houwRc_L6xBEJytH1LUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to