[ 
https://issues.apache.org/jira/browse/MESOS-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011296#comment-14011296
 ] 

Jie Yu commented on MESOS-1421:
-------------------------------

[~tstclair] regarding the namespaces, I think one of our plan is to put each 
container in it's own pid namespace so that we can have isolations between 
containers. If that's the case, even if the slave process goes down, all 
containers (executors) should still be running (as each container does 
'unshare'), and when slave restarts, it should be able to reconnect them.

> Breaking out sub-process tracking from the slave.
> -------------------------------------------------
>
>                 Key: MESOS-1421
>                 URL: https://issues.apache.org/jira/browse/MESOS-1421
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Timothy St. Clair
>
> I had recently treaded through the cgroups and tracking code to enable 
> systemd support and I was struck by the amount of work the slave has to do.  
> In other grid systems, the slave typically forks a worker whose job is child 
> process tracking, cleanup, etc.   
> In the future, if the full extent of namespaces are leveraged, and the init 
> process goes down, then the children go down.  This would alleviate that 
> potential problem, but it may be an architectural change.   
> Not to mention, it's a clean separation of concerns.  
> e.g. 
> slave<>worker<>executor 
> in condor-speak:
> startd<>starter<> job
> http://research.cs.wisc.edu/htcondor/manual/v8.1/3_1Introduction.html#9108
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to