[ 
https://issues.apache.org/jira/browse/MESOS-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221645#comment-14221645
 ] 

Charles Baker commented on MESOS-1837:
--------------------------------------

I hit this exact same issue on CentOS 6.5 and was able (through much trial and 
error!) to figure out that I had not requested enough memory for my app from 
Marathon. Mesos didn't know something was wrong till the app suddenly died and 
the /proc/<pid> directory went away. Troubleshooting was confounded by the fact 
that I did not get any Java OOM exceptions on stdout or stderr streams but a 
tell-tale sign was that my app would only partially startup and never fully 
finished initializing. I wonder if Marathon knew the state of the app at this 
point? I did not see anything in the syslog indicating such.

As an aside, I will say that setting the isolation, cgroups_root and 
cgroups_hierarchy are very important and they do differ radically on CentOS 
from Ubuntu. However, when those were wrong such as setting isolation to 
cgroups and _root and _hierarchy were incorrect mainly caused the slave to not 
startup at all. The real problem with this particular error is that the 
/proc/<pid> gets pulled out from under Mesos. 

The error message itself is actually pretty specific but I imagine this can 
happen not just with insufficient memory but with any kind of problem that 
would cause the app to terminate abruptly.

> failed to determine cgroup for the 'cpu' subsystem
> --------------------------------------------------
>
>                 Key: MESOS-1837
>                 URL: https://issues.apache.org/jira/browse/MESOS-1837
>             Project: Mesos
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.20.1
>         Environment: Ubuntu 14.04
>            Reporter: Chris Fortier
>
> Attempting to launch Docker container with Marathon. Container is launched 
> then fails. 
> A search of /var/log/syslog reveals:
> Sep 27 03:01:43 vagrant-ubuntu-trusty-64 mesos-slave[1409]: E0927 
> 03:01:43.546957  1463 slave.cpp:2205] Failed to update resources for 
> container 8c2429d9-f090-4443-8108-0206ca37f3fd of executor 
> hello-world.970dbe74-45f2-11e4-8b1d-56847afe9799 running task 
> hello-world.970dbe74-45f2-11e4-8b1d-56847afe9799 on status update for 
> terminal task, destroying container: Failed to determine cgroup for the 'cpu' 
> subsystem: Failed to read /proc/9792/cgroup: Failed to open file 
> '/proc/9792/cgroup': No such file or directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to