The semantics of these changes would have an impact on the upcoming task reconciliation.
@BenM: Can you chime in here on how this fits into the task reconciliation work that you've been leading? On Wed, Sep 10, 2014 at 6:12 PM, Adam Bordelon <[email protected]> wrote: > I agree with Niklas that if the executor has sent a terminal status update > to the slave, then the task is done and the master should be able to > recover those resources. Only sending the oldest status update to the > master, especially in the case of framework failover, prevents these > resources from being recovered in a timely manner. I see a couple of > options for getting around this, each with their own disadvantages. > 1) Send the entire status update stream to the master. Once the master sees > the terminal status update, it will removeTask and recover the resources. > Future resends of the update will be forwarded to the scheduler, but the > master will ignore (with warning and invalid_update++ metrics) the > subsequent updates as far as its own state for the removed task is > concerned. Disadvantage: Potentially sends a lot of status update messages > until the scheduler reregisters and acknowledges the updates. > Disadvantage2: Updates could be sent to the scheduler out of order if some > updates are dropped between the slave and master. > 2) Send only the oldest status update to the master, but with an annotation > of the final/terminal state of the task, if any. That way the master can > call removeTask to update its internal state for the task (and update the > UI) and recover the resources for the task. While the scheduler is still > down, the oldest update will continue to be resent and forwarded, but the > master will ignore the update (with a warning as above) as far as its own > internal state is concerned. When the scheduler reregisters, the update > stream will be forwarded and acknowledged one-at-a-time as before, > guaranteeing status update ordering to the scheduler. Disadvantage: Seems a > bit hacky to tack a terminal state onto a running update. Disadvantage2: > State endpoint won't show all the status updates until the entire stream > actually gets forwarded+acknowledged. > Thoughts? > > > On Wed, Sep 10, 2014 at 5:55 PM, Vinod Kone <[email protected]> wrote: > > > The main reason is to keep status update manager simple. Also, it is very > > easy to enforce the order of updates to the master/framework in this > model. > > If we allow multiple updates for a task to be in flight, it's really hard > > (impossible?) to ensure that we are not delivering out-of-order updates > > even in edge cases (failover, network partitions etc). > > > > On Wed, Sep 10, 2014 at 5:35 PM, Niklas Nielsen <[email protected]> > > wrote: > > > > > Hey Vinod - thanks for chiming in! > > > > > > Is there a particular reason for only having one status in flight? Or > to > > > put it in another way, isn't that too strict behavior taken that the > > master > > > state could present the most recent known state if the status update > > > manager tried to send more than the front of the stream? > > > Taken very long timeouts, just waiting for those to disappear seems a > bit > > > tedious and hogs the cluster. > > > > > > Niklas > > > > > > On 10 September 2014 17:18, Vinod Kone <[email protected]> wrote: > > > > > > > What you observed is expected because of the way the slave > > (specifically, > > > > the status update manager) operates. > > > > > > > > The status update manager only sends the next update for a task if a > > > > previous update (if it exists) has been acked. > > > > > > > > In your case, since TASK_RUNNING was not acked by the framework, > master > > > > doesn't know about the TASK_FINISHED update that is queued up by the > > > status > > > > update manager. > > > > > > > > If the framework never comes back, i.e., failover timeout elapses, > > master > > > > shuts down the framework, which releases those resources. > > > > > > > > On Wed, Sep 10, 2014 at 4:43 PM, Niklas Nielsen < > [email protected]> > > > > wrote: > > > > > > > > > Here is the log of a mesos-local instance where I reproduced it: > > > > > https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to > 19 > > > are > > > > > stuck in running state). > > > > > There is a lot of output, so here is a filtered log for task 10: > > > > > https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d > > > > > > > > > > At first glance, it looks like the task can't be found when trying > to > > > > > forward the finish update because the running update never got > > > > acknowledged > > > > > before the framework disconnected. I may be missing something here. > > > > > > > > > > Niklas > > > > > > > > > > > > > > > On 10 September 2014 16:09, Niklas Nielsen <[email protected]> > > > wrote: > > > > > > > > > > > Hi guys, > > > > > > > > > > > > We have run into a problem that cause tasks which completes, > when a > > > > > > framework is disconnected and has a fail-over time, to remain in > a > > > > > running > > > > > > state even though the tasks actually finishes. > > > > > > > > > > > > Here is a test framework we have been able to reproduce the issue > > > with: > > > > > > https://gist.github.com/nqn/9b9b1de9123a6e836f54 > > > > > > It launches many short-lived tasks (1 second sleep) and when > > killing > > > > the > > > > > > framework instance, the master reports the tasks as running even > > > after > > > > > > several minutes: > > > > > > > > > > > > > > > > > > > > > http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png > > > > > > > > > > > > When clicking on one of the slaves where, for example, task 49 > > runs; > > > > the > > > > > > slave knows that it completed: > > > > > > > > > > > > > > > > > > > > > http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png > > > > > > > > > > > > The tasks only finish when the framework connects again (which it > > may > > > > > > never do). This is on Mesos 0.20.0, but also applies to HEAD (as > of > > > > > today). > > > > > > Do you guys have any insights into what may be going on here? Is > > this > > > > > > by-design or a bug? > > > > > > > > > > > > Thanks, > > > > > > Niklas > > > > > > > > > > > > > > > > > > > > >
