[ 
https://issues.apache.org/jira/browse/MESOS-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219197#comment-15219197
 ] 

Guillermo Rodriguez commented on MESOS-5067:
--------------------------------------------

I can ask swarm, but other node.

The problem I have right now is that I have 4 swarm services for HA. So docker 
tasks can be started at any swarm instance. MesosDNS resolves the 
swarm.marathon.mesos for any swarm service. All ok.

Any swarm service can see any running task in any node so they can answer with 
status, info, etc on any node. Perfect.

My system gets the docker task UID on startup and can ask any of the swarm 
services about the task status. Awesome.

But then one service crashes and any task that was registered or started on 
that particular framework is killed. So essentially if swarm crashes I lose 25% 
of my running tasks. Given that swarm crashes anytime (or marathon restarts it 
because its unresponsive for too long) then I lost any progress in my number 
crunching.

So I was wondering if I could launch tasks from a framework without specifying 
the frameword id for example. Or something like that.

> Killing a framework does not kill framework tasks
> -------------------------------------------------
>
>                 Key: MESOS-5067
>                 URL: https://issues.apache.org/jira/browse/MESOS-5067
>             Project: Mesos
>          Issue Type: Wish
>            Reporter: Guillermo Rodriguez
>
> By default, when a framework is terminated, mesos-master terminates all child 
> tasks for that framework.
> There are some cases when I might like to stop a framework but not kill the 
> tasks of the framework. 
> In my particular case. I have Docker Swarm running, Swarm allows me to send 
> number crunching jobs to the cluster and they can run for hours.
> The problem is that Swarm is also quite flaky and can crash anytime. If that 
> happens then all jobs are terminated and all the processing time is lost.
> So, I would like to be able to set some flag for a framework where I tell 
> mesos master that the jobs started by the framework should be considered 
> separate of the framework itself so that the framework can be restarted and 
> jobs will keep running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to