Some feedback on this ticket: it focuses on the solution rather than the problem. We generally want to avoid this, I guess it's been coined 'The XY Problem' (thanks Benjamin Bannier). In this case it turns out that there are actually 2 distinct problems that the user is facing:
(1) Passive masters return information in some endpoints that can be interpreted as incorrect. A passive master does not know the list of tasks, for example, and so returning an empty list is less accurate than expressing that no response is possible. (2) It is difficult to reliably obtain cluster state through the existing endpoints. This one is less clear to me than the first problem. Here we have to think through how we want users to be hitting state endpoints. Do they hit all the masters and take the first valid response? Do they first ask for the leader, then query the leader? Both of these have races (the first case has an issue that the requests are not atomic, you may receive two valid responses ; the second case the leader information may become stale before the second request). Do we add redirects? Even redirects have issues, there may be multiple redirects, there may be a redirect to a master that is unable to redirect further (and so we haven't really solved the race difficulties with redirects). The point is, it looks like we can easily solve (1), but (2) warrants more thought and will be easier to assess with the problem well understood. On Wed, Jan 6, 2016 at 12:52 PM, Diogo Gomes <[email protected]> wrote: > Hi, Adam and Haosdent > > > Resurrecting this issue, https://issues.apache.org/jira/browse/MESOS-1865, > I would like to make a +1 for this change, which apparently became cold but > I think is very relevant and we had enough time to be prepared for a change > like this, right? > > > If necessary, can I help with something? > > > Diogo Gomes > > > > >
