Steve Niemitz created MESOS-2214:
------------------------------------
Summary: Mesos slave can't restart if running with
--slave_subsystems=blkio or net_cls and checkpointing is enabled
Key: MESOS-2214
URL: https://issues.apache.org/jira/browse/MESOS-2214
Project: Mesos
Issue Type: Bug
Reporter: Steve Niemitz
Steps to reproduce:
- Enable checkpointing on the slave (on by default)
- Enable checkpointing on the framework
- Start the slave with --slave_subsystems=memory,cpuacct,blkio,net_cls
- Ensure a task is running on the slave
- Restart mesos-slave
Doing so causes this error:
I0113 15:38:46.600033 729216 detector.cpp:433] A new leading master
([email protected]:5050) is detected
I0113 15:38:46.610535 729196 slave.cpp:189] Moving slave process into its own
cgroup for subsystem: cpuacct
I0113 15:38:46.618446 729196 slave.cpp:189] Moving slave process into its own
cgroup for subsystem: net_cls
A slave (or child process) is still running, please check the process(es) '{
561866, 561880, 561924, 561977, 561978, 700306, 700319 }' listed in
/sys/fs/cgroup/net_cls/mesos/slave/cgroups.proc
Also, a smaller bug is that the error message is not logged with the logging
system but instead printed to stderr.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)