+1 (my two cent is that the “correct” approach from an operations viewpoint is to first query for the leader, then ask the leader; shortcoming identified by Ben obvious, but possibly the lesser of the two evils - and probably unavoidable in a distributed systems without atomic transactions - which I don’t think anyone on this list would advocate for?)
Thanks to the Benjamin(s) for (finally) giving a name to something I have encountered often :) (I used to informally call it “the A-B problems” - your naming is definitely more compelling!) > On Jan 8, 2016, at 12:29 PM, Benjamin Mahler <[email protected]> wrote: > > Some feedback on this ticket: it focuses on the solution rather than the > problem. We generally want to avoid this, I guess it's been coined 'The XY > Problem' (thanks Benjamin Bannier). In this case it turns out that there > are actually 2 distinct problems that the user is facing: > > (1) Passive masters return information in some endpoints that can be > interpreted as incorrect. A passive master does not know the list of tasks, > for example, and so returning an empty list is less accurate than > expressing that no response is possible. > > (2) It is difficult to reliably obtain cluster state through the existing > endpoints. This one is less clear to me than the first problem. Here we > have to think through how we want users to be hitting state endpoints. Do > they hit all the masters and take the first valid response? Do they first > ask for the leader, then query the leader? Both of these have races (the > first case has an issue that the requests are not atomic, you may receive > two valid responses ; the second case the leader information may become > stale before the second request). Do we add redirects? Even redirects have > issues, there may be multiple redirects, there may be a redirect to a > master that is unable to redirect further (and so we haven't really solved > the race difficulties with redirects). > > The point is, it looks like we can easily solve (1), but (2) warrants more > thought and will be easier to assess with the problem well understood. > > On Wed, Jan 6, 2016 at 12:52 PM, Diogo Gomes <[email protected]> wrote: > >> Hi, Adam and Haosdent >> >> >> Resurrecting this issue, https://issues.apache.org/jira/browse/MESOS-1865, >> I would like to make a +1 for this change, which apparently became cold but >> I think is very relevant and we had enough time to be prepared for a change >> like this, right? >> >> >> If necessary, can I help with something? >> >> >> Diogo Gomes >> >> >> >> >>
