Re: Trying to get task reconciliation to work

David Greenberg Fri, 18 Apr 2014 13:36:12 -0700

Ah, I see. I am just trying to understand what messages are sent for
various reconciliation scenarios at edge cases. I think that infinite task
failover is actually the behavior I want.


Thanks,
David

On Friday, April 18, 2014, Benjamin Mahler <[email protected]>
wrote:

> So task reconciliation will always tell me if a task is finished when the
>> slave is still running
>
>
> No, as that would imply we kept infinite task history in the Master. As
> soon as you get your final update, assume you will get TASK_LOST for
> subsequent reconciliations.
>
> If so, these semantics are very convenient for
>> frameworks that fail to failover in a timely manner, and then ask for
>> tasks
>> that belonged to their previous FrameworkID.
>
>
> What problem are you solving here? That sounds a bit bizarre because we
> try to provide isolation between frameworks and thus we try to avoid
> leaking information across frameworks. I realized this is what Vinod
> mentioned, although the API doesn't allow you to ask about a different
> framework's tasks.
>
> But taking a step back, why don't you set an infinite failover timeout for
> your framework if you want to make sure your tasks can be recovered?
>
>
>
> On Fri, Apr 18, 2014 at 11:20 AM, David Greenberg 
> <[email protected]>wrote:
>
> So task reconciliation will always tell me if a task is finished when the
> slave is still running, and it will give me TASK_LOST if the slave or task
> is unknown to the master? If so, these semantics are very convenient for
> frameworks that fail to failover in a timely manner, and then ask for tasks
> that belonged to their previous FrameworkID.
>
>
> On Fri, Apr 18, 2014 at 1:55 PM, Benjamin Mahler
> <[email protected]>wrote:
>
> > Vinod, David is asking about tasks that "belong" to the framework in that
> > they were "launched" by it, in which case your answer is not correct. We
> > don't keep track of tasks so we don't know whether the task "belongs" to
> > the framework in this sense.
> >
> > David, you will either receive TASK_LOST or nothing (if the slave for
> > the task is in a transient state).
> >
> > This is determined more so by the SlaveID than the TaskID as the Master
> > does not persistently track tasks.
> >
> > (a) If you're asking about an unknown slave, you will get TASK_LOST.
> > (b) If you're asking about a known slave and an unknown task, you will
> get
> > TASK_LOST.
> > (c) If you're asking about a known slave and a known task with a
> different
> > state, you will be sent the latest state.
> >
> > If you consider these semantics, you'll realize that you may receive
> > TASK_LOST if you try to reconcile your task that finished correctly. This
> > is why I mentioned the need to persist updates in (1) above. Let's say
> you
> > receive a terminal update of TASK_FINISHED and then you still try to
> > reconcile against a failed over Master. This new Master will reply with
> > TASK_LOST because it is unaware of the task/slave. So, you will always
> > receive your valid terminal update before getting a TASK_LOST from
> > reconciliation.
> >
> >
> > On Fri, Apr 18, 2014 at 10:46 AM, Vinod Kone <[email protected]>
> wrote:
> >
> >> If a framework asks to reconcile a task that doesn't belong to it there
> >> would be no response from the master. This is nice because it avoids
> >> information leak between frameworks.
> >>
> >>
> >> On Fri, Apr 18, 2014 at 5:04 AM, David Greenberg <
> [email protected]
> >> >wrote:
> >>
> >> > Piggybacking onto this thread with a follow up question: what happens
> if
> >> > you ask the master to reconcile some tasks that weren't launched by
> your
> >> > framework? Will you get messages that express those tasks were
> unknown,
> >> > lost, or will nothing respond?
> >> >
> >> >
> >> > On Thursday, April 17, 2014, Sharma Podila <[email protected]>
> wrote:
> >> >
> >> >> No problem, I have a better understanding now.
> >> >> And it was useful to see the three items you listed explicitly.
> >> >>
> >> >>
> >> >> On Thu, Apr 17, 2014 at 2:39 PM, Benjamin Mahler <
> >> >> [email protected]> wrote:
> >> >>
> >> >> Good to see you were playing around with reconciliation, we should
> have
> >> >> made the current semantics more clear. Especially in light of the
> fact
> >> that
> >> >> it's not implemented fully until one uses a strict registrar (likely
> >> >> 0.20.0).
> >> >>
> >> >> Think of reconciliation as the fallback mechanism to ensure that
> state
> >> is
> >> >> consistent, it's not designed to be something to inform you of things
> >> you
> >> >> were already told (in this case, that the tasks were running).
> >> Although we
> >> >> could consider sending updates even when task state remains the same.
> >> >>
> >> >>
> >> >> For the purpose of this conversation, let's say we
>
>

Re: Trying to get task reconciliation to work

Reply via email to