Steve Niemitz created MESOS-2214:
------------------------------------

             Summary: Mesos slave can't restart if running with 
--slave_subsystems=blkio or net_cls and checkpointing is enabled
                 Key: MESOS-2214
                 URL: https://issues.apache.org/jira/browse/MESOS-2214
             Project: Mesos
          Issue Type: Bug
            Reporter: Steve Niemitz


Steps to reproduce:
- Enable checkpointing on the slave (on by default)
- Enable checkpointing on the framework
- Start the slave with --slave_subsystems=memory,cpuacct,blkio,net_cls
- Ensure a task is running on the slave
- Restart mesos-slave

Doing so causes this error:
I0113 15:38:46.600033 729216 detector.cpp:433] A new leading master 
([email protected]:5050) is detected
I0113 15:38:46.610535 729196 slave.cpp:189] Moving slave process into its own 
cgroup for subsystem: cpuacct
I0113 15:38:46.618446 729196 slave.cpp:189] Moving slave process into its own 
cgroup for subsystem: net_cls
A slave (or child process) is still running, please check the process(es) '{ 
561866, 561880, 561924, 561977, 561978, 700306, 700319 }' listed in 
/sys/fs/cgroup/net_cls/mesos/slave/cgroups.proc

Also, a smaller bug is that the error message is not logged with the logging 
system but instead printed to stderr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to