[
https://issues.apache.org/jira/browse/MESOS-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517529#comment-14517529
]
haosdent commented on MESOS-2554:
---------------------------------
[~jieyu] I am not sure whether my idea to fix this bug are correct or not. Let
me describe my idea.
1. check the --isolation and --slave_subsystems when start slave. When they are
inconsistent, print error message and exit slave. But I not sure how to check
when isolation param is "external"
2. pass slave_subsystems to different containerizer.cpp, check the consistent
in different containerizer implementation. If the container don't use the
subsystems which list in --slave_subsystems, move the container to the root in
that subsystems. For example, if container A use 'mem' as isolation, and slave
use 'cpu,mem' as slave_subsystems when start. After start container A, move it
to the cgroup cpu root hierarchy
I am not sure whether I understand this issue correct or not, please indicate
the errors. Thank you very much.
> Slave flaps when using --slave_subsystems that are not used for isolation.
> --------------------------------------------------------------------------
>
> Key: MESOS-2554
> URL: https://issues.apache.org/jira/browse/MESOS-2554
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.21.0, 0.21.1, 0.22.0
> Reporter: Jie Yu
> Priority: Critical
>
> Say one use --slave_subsystems=cpuacct
> However, if he/she does not use cpuacct cgroup for isolation, all processes
> forked by the slave (e.g., tasks) will be part of the slave cgroup. This is
> not expected. ALso, more importantly, this will cause the slave to flap when
> restart because there are task processes in slave's cgroup.
> We should add a check during slave startup at least!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)