On Fri, Jan 8, 2016 at 12:29 PM, Benjamin Mahler <[email protected]> wrote:
> (2) It is difficult to reliably obtain cluster state through the existing
> endpoints. This one is less clear to me than the first problem. Here we
> have to think through how we want users to be hitting state endpoints. Do
> they hit all the masters and take the first valid response? Do they first
> ask for the leader, then query the leader? Both of these have races (the
> first case has an issue that the requests are not atomic, you may receive
> two valid responses ; the second case the leader information may become
> stale before the second request). Do we add redirects? Even redirects have
> issues, there may be multiple redirects, there may be a redirect to a
> master that is unable to redirect further (and so we haven't really solved
> the race difficulties with redirects).

I believe the proposed behavior is:

* Clients can query any master
* Endpoint queries against a non-leading master result in redirects to
the current leader

If the client follows a redirect to a different master, it may get
redirected one or more times; it might also be unable to reach the
current leader, or the queried master might be unable to determine the
current leader. That seems like quite reasonable behavior to me,
though (and technically I would argue that these situations aren't
really "races" -- the client just needs to recognize that as in any
distributed system, the information it observes might be stale).

We could alternatively introduce a "who-is-the-current-leader"
endpoint (which is something people have asked for [1]). As long as
non-leading masters notify clients that they aren't talking to a
leader (e.g., by returning a 403/503 error), that should also avoid
races.

Neil

[1] https://issues.apache.org/jira/browse/MESOS-3841

Reply via email to