If a framework asks to reconcile a task that doesn't belong to it there would be no response from the master. This is nice because it avoids information leak between frameworks.
On Fri, Apr 18, 2014 at 5:04 AM, David Greenberg <[email protected]>wrote: > Piggybacking onto this thread with a follow up question: what happens if > you ask the master to reconcile some tasks that weren't launched by your > framework? Will you get messages that express those tasks were unknown, > lost, or will nothing respond? > > > On Thursday, April 17, 2014, Sharma Podila <[email protected]> wrote: > >> No problem, I have a better understanding now. >> And it was useful to see the three items you listed explicitly. >> >> >> On Thu, Apr 17, 2014 at 2:39 PM, Benjamin Mahler < >> [email protected]> wrote: >> >> Good to see you were playing around with reconciliation, we should have >> made the current semantics more clear. Especially in light of the fact that >> it's not implemented fully until one uses a strict registrar (likely >> 0.20.0). >> >> Think of reconciliation as the fallback mechanism to ensure that state is >> consistent, it's not designed to be something to inform you of things you >> were already told (in this case, that the tasks were running). Although we >> could consider sending updates even when task state remains the same. >> >> >> For the purpose of this conversation, let's say we're in the 0.20.0 >> world, operating with the registrar. And let's assume your goal is to build >> a highly available framework (I will be documenting how to do this for >> 0.20.0): >> >> (1) *When you receive a status update, you must persist this information >> before returning from the statusUpdate() callback*. Once you return from >> the callback, the driver will acknowledge the slave directly. Slaves will >> retry status update delivery *until* the acknowledgement is received from >> the scheduler driver in order to ensure that the framework processed the >> update. >> >> (2) *When you receive a "slave lost" signal, it means that your tasks >> that were running on that slave are in state TASK_LOST*, and any >> reconciliation you perform for these tasks will result in a reply of >> TASK_LOST. Most of the time we'll deliver these TASK_LOST automatically, >> but with a confluence of Master *and* Slave failovers, we are unaware of >> which tasks were running on the slave as we do not persist this information >> in the Master. >> >> (3) To guarantee that you have a consistent view of task states. *You >> must also periodically reconcile task state against the Master*. This is >> only because the delivery of the "slave lost" signal in (2) is not reliable >> (the Master could failover after removing a slave but before telling >> frameworks that the slave was lost). >> >> You'll notice that this model forces one to serially persist all status >> update changes. We are planning to expose mechanisms to allow "batch" >> acknowledgement of status updates in the lower-level API that benh has >> given talks about. With a lower-level API, it is possible to build more >> powerful libraries that hide much of these details! >> >> You'll also perhaps notice that only (1) and (3) are strictly required >> for consistency, but (2) is highly recommended as the vast majority of the >> time the "slave lost" signal will be delivered and you can take action >> quickly, without having to rely on periodic reconciliation. >> >> Please let me know if anything here was not clear! >> >> >> On Thu, Apr 17, 2014 at 1:47 PM, Sharma Podila <[email protected]>wrote: >> >> Should've looked at the code before sending the previous email... >> master/main.cpp confirmed what I needed to know. It doesn't look like I >> will be able to use reconcileTasks the way I thought I could. Effectively, >> a lack of callback could either mean that the master agrees with the >> requested reconcile task state, or that the task and/or slave is currently >> unknown. Which makes it an unreliable source of data. I understand this is >> expected to improve later by leveraging the registrar, but, I suspect >> there's more to it. >> >> I take it then that individual frameworks need to have their own >> mechanisms to ascertain the state of their tasks. >> >> >> On Thu, Apr 17, 2014 at 12:53 PM, Sharma Podila <[email protected]>wrote: >> >> Hello >> >>
