Framework failover timeout is orthogonal. Slave checkpointing is, IIRC, completely slave side. Framework failover timeout just decides when the master will consider the framework gone (and effectively commence rm -rf of its tasks) after the framework disconnects.
On Thursday, November 19, 2015, <meghdoo...@yahoo.com.invalid> wrote: > I started digging the code and found the same as well. Thx for confirming > Bill. > > General question, does the framework failover timeout feature only work if > checkpoint is set by framework as well? Or the checkpoint feature is > strictly for slave side and mesos master will keep tasks running regardless > of checkpoint flag value if framework comes back in time as long just the > timeout is set? Guess I can check the mesos code. > > Thx > > Sent from my iPhone > > > On Nov 19, 2015, at 11:15 PM, Bill Farner <wfar...@apache.org > <javascript:;>> wrote: > > > > It was Aurora that drove this requirement, and Aurora only operates in > this > > mode. > > > > > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java#L130-L131 > > > >> On Thu, Nov 19, 2015 at 11:07 PM, <meghdoo...@yahoo.com.invalid> wrote: > >> > >> I am guessing aurora does not use mesos checkpoint feature where tasks > can > >> run even if slave stopped (for an upgrade say). > >> Can this be supported (optionally) if not today especially as part of > >> custom executor support? > >> Mesos slaves has enabled check pointing by default a while back but it > >> needs framework to set it as well for the feature to work. > >> > >> Thx > >> > >> Sent from my iPhone >