On Wed, Jan 25, 2017 at 6:39 PM, Rudi Chiarito <[email protected]> wrote:

> On Tue, Jan 24, 2017 at 1:37 PM, Mark D. Roth <[email protected]> wrote:
>>
>> I started a similar discussion on canarying ConfigMaps under Kubernetes.
>>> While this design is the way at least one push mechanism inside Google
>>> works, there's also a more predictable, push-like one (GDP's), where you
>>> pick N candidates, tell them to get a new config and watch them for T
>>> minutes, making sure they don't die. That, of course, assumes you keep
>>> track of the clients, through e.g. grpclb, and can also track their health
>>> (Borg and Kubernetes both do). Would you consider that for future
>>> developments?
>>>
>>
>> The problem is that we have no mechanism for tracking clients that way.
>> Tracking them via grpclb won't work, because clients should be able to use
>> the service config without using grpclb.  And more generally, any tracking
>> mechanism like this would require a lot of communication between servers
>> and clients, which would add a lot of complexity.  I don't think that the
>> advantage of getting slightly more accurate canary percentages is really
>> worth that complexity.
>>
>
>
> Sorry, I don't know what I was thinking when I mentioned grpclb. And I
> should have explained things further.
>
> For additional background, I also created an issue for TXT record support
> in Kubernetes' kube-dns, citing gRPC as an use case: https://github.com/
> kubernetes/dns/issues/38
>
> My idea is that there would be a 1:1 mapping between a Kubernetes service
> and a gRPC service, something that developers seem to like. For example,
> you'd point clients to mysvc.myns.svc.mycluster.example.com (or a shorter
> relative name). That's a DNS hostname managed by kube-dns. With the new
> support, it would return the appropriate TXT record. More in detail:
>
> There's a controller (daemon) in the Kubernetes cluster in charge of
> pushing gRPC config changes. It sends an update request on the mysvc
> service object. kube-dns, also running in the cluster, sees the object
> change and starts serving new TXT records. So far, gRPC itself doesn't need
> to be involved, i.e. there's no special protocol between servers and
> clients (or grpclb). Clients keep using DNS.
>
> With your current design, the controller pushes a config with e.g. a 1%
> canary. Accurate percentages is not even the main issue; determinism and
> proper health check tracking are. Assuming that exactly 1% of clients pick
> up the change, now you have the issue of figuring if a replica that just
> crashed did it because of the new config or for unrelated reasons.
> Engineers really hate it when a configuration change gets rolled back
> automatically because of a random external event. Conversely, when an
> unhealthy service is seeing an elevated crash rate and you try to push a
> new config to stop the bleeding, it's very valuable to know if the new
> settings are making instances stable again.
>
> In both cases, you need to track which configuration each client is using.
> You could do this through e.g. monitoring (each client reports the current
> config version) and correlate that with health checks.
>
> Or, more simply, you could have the controller pick up N victims, push a
> config with a field along the lines of
>
> `"hosts": "host1:port,host2:port,..."`
>
> wait n=TTL seconds and then keep tracking the health status for those
> clients. This is all done in the controller, which is responsible for
> discovery, tracking clients through the proper APIs. The example above
> involves Kubernetes, but, in general, the same mechanism applies to every
> other environment. The only client change is for the grpc client to match
> its own host:port against "hosts". A proper config should have either
> "percentage" or "hosts", not both. Or maybe the latter always wins. The
> idea is that you canary with a small number of clients, then switch to
> percentages, so that the config doesn't get bloated with tens or hundreds
> of hostnames.
>
> This approach has also the advantage that, if you have N groups of clients
> (e.g. three different frontend services that talk to a shared database),
> you can treat them and push to each of them independently. "clientLanguage"
> might be too coarse, especially when all your clients are in the same
> language. :-)
>

Thanks for the detailed explanation of this use-case.  As I think I
mentioned up-thread, I certainly agree that providing some mechanism to
allow deterministic client selection would be useful.

I'm warming to the idea of adding a 'hosts' selector field, but I do worry
that people could easily start running into the TXT record length
limitation if they start creating very long lists of hosts.

We might be able to ameliorate some of that by allowing some sort of simple
pattern-matching language, although I'd prefer to avoid taking an external
dependency on a regexp library, so it would probably need to be something
very simple -- like maybe simple wildcard matching triggered by a '*'
character.  So, for example, if your three different frontend services run
on hosts with the names as follows:

Frontend service "Foo": foofrontend1, foofrontend2, foofrontend3, ...
Frontend service "Bar": barfrontend1, barfrontend2, barfrontend3, ...
Frontend service "Baz": bazfrontend1, bazfrontend2, bazfrontend3, ...

Then you could select only the frontends from the first service by saying
something like "foofrontend*".  But we probably would *not* allow something
like "foofrontend[12]" to get just the first two frontends of service
"Foo"; instead, you would need to list them separately.  Would something
like that be useful in your use case?

Another possibility here would be to select by IP address.  In that case,
we could even allow subnet notation to select a whole range of IP addresses
at once.  (Though there could be some complexity here with regard to
multi-homed hosts and how they'd figure out which IP would apply to them.)
 Would something like that be useful?

Do we actually want to select the client port in any of these cases?  I'm
not sure that's useful, since the client port would presumably be different
for each backend it's connected to, and it would change any time it
reconnected to a given backend.  Is there a use-case where selecting on the
client port is useful?

In terms of how this is encoded in JSON, I would probably want it to be a
list of strings rather than a single string with a delimiter character.  In
other words, instead of 'hosts': 'host1,host2,...', it would be something
like 'hosts': ['host1','host2',...].

What do you think?


>
>
>
>> I'd prefer to avoid that if possible, because I think it's valuable for
>> debugging purposes to have the TXT record in human-readable form.  However,
>> we can certainly add support for something like this if/when people start
>> running into the length limitation.
>>
>
> I agree. It's really ugly and a true last resort to be implemented only
> when the actual need arises.
>
>
> --
> Rudi Chiarito — Infrastructure — Clarifai, Inc.
> "Trust me, I know what I'm doing." (Sledge Hammer!)
>



-- 
Mark D. Roth <[email protected]>
Software Engineer
Google, Inc.

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/grpc-io.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/CAJgPXp6bSJiMMRqNh0AwLk53vmP0wYy0hsNgyKtcTrQtx4FZtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to