Vinod, sure, I'd like to. I'll also look into MESOS-6406 that Neil mentioned, when I have time. If no one does that before :)
On Tue, Jul 18, 2017 at 1:14 AM, Vinod Kone <vinodk...@apache.org> wrote: > On Mon, Jul 17, 2017 at 2:55 PM, Meghdoot bhattacharya < > meghdoo...@yahoo.com.invalid> wrote: > > > When there is no master fail over and agents join back after the default > > 5*15 timeout, we do see tasks getting killed like it used to. Because in > > this case master has sent task lost to framework. > > But we are noticing shutdown() executor callback not getting invoked. We > > started a different thread on it. This is mesos 1.1. > > > > Are you trying to say tasks will leak in latest versions and again relies > > on recon for the regular health check timeout scenario and agent joining > > back? > > > > There should be no task leaks. After partition awareness code has landed, > the master no longer shuts down the agents in the above scenario but it > still shuts down the tasks/executors of the non-partition-aware frameworks. > So the observable behavior for a framework regarding its tasks/executors > should not change. The one observable change is that frameworks do not get > `LostSlaveMessage` (`lostSlave()` callback on the driver) in this case. > -- Ilya Pronin