The semantics of these changes would have an impact on the upcoming task
reconciliation.

@BenM: Can you chime in here on how this fits into the task reconciliation
work that you've been leading?

On Wed, Sep 10, 2014 at 6:12 PM, Adam Bordelon <[email protected]> wrote:

> I agree with Niklas that if the executor has sent a terminal status update
> to the slave, then the task is done and the master should be able to
> recover those resources. Only sending the oldest status update to the
> master, especially in the case of framework failover, prevents these
> resources from being recovered in a timely manner. I see a couple of
> options for getting around this, each with their own disadvantages.
> 1) Send the entire status update stream to the master. Once the master sees
> the terminal status update, it will removeTask and recover the resources.
> Future resends of the update will be forwarded to the scheduler, but the
> master will ignore (with warning and invalid_update++ metrics) the
> subsequent updates as far as its own state for the removed task is
> concerned. Disadvantage: Potentially sends a lot of status update messages
> until the scheduler reregisters and acknowledges the updates.
> Disadvantage2: Updates could be sent to the scheduler out of order if some
> updates are dropped between the slave and master.
> 2) Send only the oldest status update to the master, but with an annotation
> of the final/terminal state of the task, if any. That way the master can
> call removeTask to update its internal state for the task (and update the
> UI) and recover the resources for the task. While the scheduler is still
> down, the oldest update will continue to be resent and forwarded, but the
> master will ignore the update (with a warning as above) as far as its own
> internal state is concerned. When the scheduler reregisters, the update
> stream will be forwarded and acknowledged one-at-a-time as before,
> guaranteeing status update ordering to the scheduler. Disadvantage: Seems a
> bit hacky to tack a terminal state onto a running update. Disadvantage2:
> State endpoint won't show all the status updates until the entire stream
> actually gets forwarded+acknowledged.
> Thoughts?
>
>
> On Wed, Sep 10, 2014 at 5:55 PM, Vinod Kone <[email protected]> wrote:
>
> > The main reason is to keep status update manager simple. Also, it is very
> > easy to enforce the order of updates to the master/framework in this
> model.
> > If we allow multiple updates for a task to be in flight, it's really hard
> > (impossible?) to ensure that we are not delivering out-of-order updates
> > even in edge cases (failover, network partitions etc).
> >
> > On Wed, Sep 10, 2014 at 5:35 PM, Niklas Nielsen <[email protected]>
> > wrote:
> >
> > > Hey Vinod - thanks for chiming in!
> > >
> > > Is there a particular reason for only having one status in flight? Or
> to
> > > put it in another way, isn't that too strict behavior taken that the
> > master
> > > state could present the most recent known state if the status update
> > > manager tried to send more than the front of the stream?
> > > Taken very long timeouts, just waiting for those to disappear seems a
> bit
> > > tedious and hogs the cluster.
> > >
> > > Niklas
> > >
> > > On 10 September 2014 17:18, Vinod Kone <[email protected]> wrote:
> > >
> > > > What you observed is expected because of the way the slave
> > (specifically,
> > > > the status update manager) operates.
> > > >
> > > > The status update manager only sends the next update for a task if a
> > > > previous update (if it exists) has been acked.
> > > >
> > > > In your case, since TASK_RUNNING was not acked by the framework,
> master
> > > > doesn't know about the TASK_FINISHED update that is queued up by the
> > > status
> > > > update manager.
> > > >
> > > > If the framework never comes back, i.e., failover timeout elapses,
> > master
> > > > shuts down the framework, which releases those resources.
> > > >
> > > > On Wed, Sep 10, 2014 at 4:43 PM, Niklas Nielsen <
> [email protected]>
> > > > wrote:
> > > >
> > > > > Here is the log of a mesos-local instance where I reproduced it:
> > > > > https://gist.github.com/nqn/f7ee20601199d70787c0 (Here task 10 to
> 19
> > > are
> > > > > stuck in running state).
> > > > > There is a lot of output, so here is a filtered log for task 10:
> > > > > https://gist.github.com/nqn/a53e5ea05c5e41cd5a7d
> > > > >
> > > > > At first glance, it looks like the task can't be found when trying
> to
> > > > > forward the finish update because the running update never got
> > > > acknowledged
> > > > > before the framework disconnected. I may be missing something here.
> > > > >
> > > > > Niklas
> > > > >
> > > > >
> > > > > On 10 September 2014 16:09, Niklas Nielsen <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi guys,
> > > > > >
> > > > > > We have run into a problem that cause tasks which completes,
> when a
> > > > > > framework is disconnected and has a fail-over time, to remain in
> a
> > > > > running
> > > > > > state even though the tasks actually finishes.
> > > > > >
> > > > > > Here is a test framework we have been able to reproduce the issue
> > > with:
> > > > > > https://gist.github.com/nqn/9b9b1de9123a6e836f54
> > > > > > It launches many short-lived tasks (1 second sleep) and when
> > killing
> > > > the
> > > > > > framework instance, the master reports the tasks as running even
> > > after
> > > > > > several minutes:
> > > > > >
> > > > >
> > > >
> > >
> >
> http://cl.ly/image/2R3719461e0t/Screen%20Shot%202014-09-10%20at%203.19.39%20PM.png
> > > > > >
> > > > > > When clicking on one of the slaves where, for example, task 49
> > runs;
> > > > the
> > > > > > slave knows that it completed:
> > > > > >
> > > > >
> > > >
> > >
> >
> http://cl.ly/image/2P410L3m1O1N/Screen%20Shot%202014-09-10%20at%203.21.29%20PM.png
> > > > > >
> > > > > > The tasks only finish when the framework connects again (which it
> > may
> > > > > > never do). This is on Mesos 0.20.0, but also applies to HEAD (as
> of
> > > > > today).
> > > > > > Do you guys have any insights into what may be going on here? Is
> > this
> > > > > > by-design or a bug?
> > > > > >
> > > > > > Thanks,
> > > > > > Niklas
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to