Ah, I see. I am just trying to understand what messages are sent for various reconciliation scenarios at edge cases. I think that infinite task failover is actually the behavior I want.
Thanks, David On Friday, April 18, 2014, Benjamin Mahler <[email protected]> wrote: > So task reconciliation will always tell me if a task is finished when the >> slave is still running > > > No, as that would imply we kept infinite task history in the Master. As > soon as you get your final update, assume you will get TASK_LOST for > subsequent reconciliations. > > If so, these semantics are very convenient for >> frameworks that fail to failover in a timely manner, and then ask for >> tasks >> that belonged to their previous FrameworkID. > > > What problem are you solving here? That sounds a bit bizarre because we > try to provide isolation between frameworks and thus we try to avoid > leaking information across frameworks. I realized this is what Vinod > mentioned, although the API doesn't allow you to ask about a different > framework's tasks. > > But taking a step back, why don't you set an infinite failover timeout for > your framework if you want to make sure your tasks can be recovered? > > > > On Fri, Apr 18, 2014 at 11:20 AM, David Greenberg > <[email protected]>wrote: > > So task reconciliation will always tell me if a task is finished when the > slave is still running, and it will give me TASK_LOST if the slave or task > is unknown to the master? If so, these semantics are very convenient for > frameworks that fail to failover in a timely manner, and then ask for tasks > that belonged to their previous FrameworkID. > > > On Fri, Apr 18, 2014 at 1:55 PM, Benjamin Mahler > <[email protected]>wrote: > > > Vinod, David is asking about tasks that "belong" to the framework in that > > they were "launched" by it, in which case your answer is not correct. We > > don't keep track of tasks so we don't know whether the task "belongs" to > > the framework in this sense. > > > > David, you will either receive TASK_LOST or nothing (if the slave for > > the task is in a transient state). > > > > This is determined more so by the SlaveID than the TaskID as the Master > > does not persistently track tasks. > > > > (a) If you're asking about an unknown slave, you will get TASK_LOST. > > (b) If you're asking about a known slave and an unknown task, you will > get > > TASK_LOST. > > (c) If you're asking about a known slave and a known task with a > different > > state, you will be sent the latest state. > > > > If you consider these semantics, you'll realize that you may receive > > TASK_LOST if you try to reconcile your task that finished correctly. This > > is why I mentioned the need to persist updates in (1) above. Let's say > you > > receive a terminal update of TASK_FINISHED and then you still try to > > reconcile against a failed over Master. This new Master will reply with > > TASK_LOST because it is unaware of the task/slave. So, you will always > > receive your valid terminal update before getting a TASK_LOST from > > reconciliation. > > > > > > On Fri, Apr 18, 2014 at 10:46 AM, Vinod Kone <[email protected]> > wrote: > > > >> If a framework asks to reconcile a task that doesn't belong to it there > >> would be no response from the master. This is nice because it avoids > >> information leak between frameworks. > >> > >> > >> On Fri, Apr 18, 2014 at 5:04 AM, David Greenberg < > [email protected] > >> >wrote: > >> > >> > Piggybacking onto this thread with a follow up question: what happens > if > >> > you ask the master to reconcile some tasks that weren't launched by > your > >> > framework? Will you get messages that express those tasks were > unknown, > >> > lost, or will nothing respond? > >> > > >> > > >> > On Thursday, April 17, 2014, Sharma Podila <[email protected]> > wrote: > >> > > >> >> No problem, I have a better understanding now. > >> >> And it was useful to see the three items you listed explicitly. > >> >> > >> >> > >> >> On Thu, Apr 17, 2014 at 2:39 PM, Benjamin Mahler < > >> >> [email protected]> wrote: > >> >> > >> >> Good to see you were playing around with reconciliation, we should > have > >> >> made the current semantics more clear. Especially in light of the > fact > >> that > >> >> it's not implemented fully until one uses a strict registrar (likely > >> >> 0.20.0). > >> >> > >> >> Think of reconciliation as the fallback mechanism to ensure that > state > >> is > >> >> consistent, it's not designed to be something to inform you of things > >> you > >> >> were already told (in this case, that the tasks were running). > >> Although we > >> >> could consider sending updates even when task state remains the same. > >> >> > >> >> > >> >> For the purpose of this conversation, let's say we > >
