Re: Non-checkpointing frameworks

2016-10-18 Thread Neil Conway
Hi folks, Thanks for the feedback! On Mon, Oct 17, 2016 at 12:44 PM, Zhitao Li wrote: > +1 to both A to B. > > Do we plan to eventually drop non-checkpionted framework support (possibly > in v2) and declare that all frameworks has to operate in this assumption? I think

Re: Non-checkpointing frameworks

2016-10-17 Thread Aaron Carey
+1 to A and B Aaron Carey Production Engineer - Cloud Pipeline Industrial Light & Magic London 020 3751 9150 On 17 October 2016 at 00:38, Qian Zhang wrote: > and requires operators to enable checkpointing on the slaves. > > > Just curious why operator needs to enable

Re: Non-checkpointing frameworks

2016-10-17 Thread Zameer Manji
Qian, Turns out the --checkpoint flag was made default and removed in Mesos 0.22. On Sun, Oct 16, 2016 at 4:38 PM, Qian Zhang wrote: > and requires operators to enable checkpointing on the slaves. > > > Just curious why operator needs to enable checkpointing on the slaves

Re: Non-checkpointing frameworks

2016-10-17 Thread Zhitao Li
+1 to both A to B. Do we plan to eventually drop non-checkpionted framework support (possibly in v2) and declare that all frameworks has to operate in this assumption? On Mon, Oct 17, 2016 at 1:36 AM, Aaron Carey wrote: > +1 to A and B > > Aaron Carey > Production Engineer -

Re: Non-checkpointing frameworks

2016-10-16 Thread Qian Zhang
> > and requires operators to enable checkpointing on the slaves. Just curious why operator needs to enable checkpointing on the slaves (I do not see an agent flag for that), I think checkpointing should be enabled in framework level rather than slave. Thanks, Qian Zhang On Sun, Oct 16, 2016

Re: Non-checkpointing frameworks

2016-10-15 Thread Zameer Manji
+1 to A and B Aurora has enabled checkpointing for years and requires operators to enable checkpointing on the slaves. On Sat, Oct 15, 2016 at 11:57 AM, Joris Van Remoortere wrote: > I'm in favor of A & B. I find it provides a better "first experience" to > users. > From

Re: Non-checkpointing frameworks

2016-10-15 Thread Joris Van Remoortere
I'm in favor of A & B. I find it provides a better "first experience" to users. >From my experience you usually have to have an explicit reason to not want to checkpoint. Most people assume the semantics provided by the checkpoint behavior is default and it can be a frustrating experience for them

Non-checkpointing frameworks

2016-10-14 Thread Neil Conway
Hi folks, I'd like input from individuals who currently use frameworks but do not enable checkpointing. Background: "checkpointing" is a parameter that can be enabled in FrameworkInfo; if enabled, the agent will write the framework pid, executor PIDs, and status updates to disk for any tasks

[jira] [Updated] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

2014-05-02 Thread Benjamin Mahler (JIRA)
to be a regression: 1. Slave re-detects leading Master. 2. Slave re-authenticates with Master. 3. Master sees slave as already activated, calls disconnect(). 4. For non-checkpointing frameworks, this call to disconnect() assumes the slave has exited, and will send TASK_LOST. 5. In the case where

[jira] [Commented] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

2014-05-02 Thread Adam B (JIRA)
/ Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks. Key: MESOS-1264 URL: https://issues.apache.org/jira/browse/MESOS-1264 Project

[jira] [Resolved] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

2014-05-02 Thread Vinod Kone (JIRA)
: https://reviews.apache.org/r/21017 Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks. Key: MESOS-1264 URL: https://issues.apache.org/jira

[jira] [Comment Edited] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

2014-04-30 Thread Adam B (JIRA)
) disable the slave in the allocator so it makes no more offers, 2) remove (non-checkpointing) frameworks, which sends the TASK_LOST message you were seeing, and 3) remove all offers on that slave and have the allocator recover those resources. 1) I definitely want the slave disabled

[jira] [Commented] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

2014-04-28 Thread Benjamin Mahler (JIRA)
of you take a look here? Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks. Key: MESOS-1264 URL: https://issues.apache.org/jira/browse/MESOS

[jira] [Created] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

2014-04-28 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-1264: -- Summary: Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks. Key: MESOS-1264 URL: https://issues.apache.org/jira/browse/MESOS-1264

[jira] [Assigned] (MESOS-1264) Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks.

2014-04-28 Thread Adam B (JIRA)
[ https://issues.apache.org/jira/browse/MESOS-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B reassigned MESOS-1264: - Assignee: Adam B Slave authentication retries can trigger TASK_LOST for non-checkpointing frameworks