Yubao Liu created MESOS-4248:
--------------------------------
Summary: mesos slave can't start in CentOS-7 docke container
Key: MESOS-4248
URL: https://issues.apache.org/jira/browse/MESOS-4248
Project: Mesos
Issue Type: Bug
Components: slave
Affects Versions: 0.26.0
Environment: My host OS is Debian Jessie, the container OS is CentOS
7.2.
# cat /etc/system-release
CentOS Linux release 7.2.1511 (Core)
# rpm -qa |grep mesos
mesosphere-zookeeper-3.4.6-0.1.20141204175332.centos7.x86_64
mesosphere-el-repo-7-1.noarch
mesos-0.26.0-0.2.145.centos701406.x86_64
$ docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 12:59:02 UTC 2015
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 12:59:02 UTC 2015
OS/Arch: linux/amd64
Reporter: Yubao Liu
"systemctl start mesos-slave" can't start mesos-slave:
{code}
# journalctl -u mesos-slave
....
Dec 24 10:35:25 mesos-slave1 systemd[1]: Started Mesos Slave.
Dec 24 10:35:25 mesos-slave1 systemd[1]: Starting Mesos Slave...
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210180 12838
logging.cpp:172] INFO level logging started!
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210603 12838
main.cpp:190] Build: 2015-12-16 23:06:16 by root
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210625 12838
main.cpp:192] Version: 0.26.0
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210634 12838
main.cpp:195] Git tag: 0.26.0
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210644 12838
main.cpp:199] Git SHA: d3717e5c4d1bf4fca5c41cd7ea54fae489028faa
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.210765 12838
containerizer.cpp:142] Using isolation: posix/cpu,posix/mem,filesystem/posix
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.215638 12838
linux_launcher.cpp:103] Using /sys/fs/cgroup/freezer as the freezer hierarchy
for the Linux launcher
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.220279 12838
systemd.cpp:128] systemd version `219` detected
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: I1224 10:35:25.227017 12838
systemd.cpp:210] Started systemd slice `mesos_executors.slice`
Dec 24 10:35:25 mesos-slave1 mesos-slave[12845]: Failed to create a
containerizer: Could not create MesosContainerizer: Failed to create launcher:
Failed to locate systemd cgroups hierarchy: does not exist
Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service: main process
exited, code=exited, status=1/FAILURE
Dec 24 10:35:25 mesos-slave1 systemd[1]: Unit mesos-slave.service entered
failed state.
Dec 24 10:35:25 mesos-slave1 systemd[1]: mesos-slave.service failed.
{code}
I used strace to debug it, it's the mesos-slave tried to access
"/sys/fs/cgroup/systemd/mesos_executors.slice", but it's actually at
"/sys/fs/cgroup/systemd/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope/mesos_executors.slice/",
mesos-slave should check "/proc/self/cgroup" to find those intermediate
directories:
{code}
# cat /proc/self/cgroup
8:perf_event:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
7:blkio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
6:net_cls,net_prio:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
5:freezer:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
4:devices:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
3:cpu,cpuacct:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
2:cpuset:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
1:name=systemd:/system.slice/docker-45875efce9019375cd0c5b29bb1a12275fb6033293f9bf3d97d774a1e5d4de52.scope
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)