[jira] [Commented] (MESOS-1837) failed to determine cgroup for the 'cpu' subsystem

Casey Sybrandy (JIRA) Mon, 12 Jan 2015 09:41:47 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273817#comment-14273817
 ]


Casey Sybrandy commented on MESOS-1837:
---------------------------------------

Hello,

This issue is coming up for us and may be causing a strange issue for us.  I'm 
running Mesos and Marathon (0.21.0 and 0.7.5 respectively) and I can sumbit 
tasks just fine for running a container via Marathon.  However, if I scale to 
more than one task (I've tried 3, 14, and 16 as we only have 13 CPUs), several 
stay in the running state while others stop and start repeatedly.  In the 14 
task case, I would see anywhere from 9-12 running tasks.  Never 14.  Looking at 
the logs on one of our machines, I see the following pattern:

* Handling status update TASK_FINISHED...
* Failed to update resources (E.g. this issue with cgroups)
* Received status update TASK_FINISHED...
* Destroying container...
* Running docker kill on container...
* Forwarding the update TASK_FINISHED...
* Sending acknowledgement for status update TASK_FINISHED...

I'm not sure if the two are related, but I wanted to mention it in case they 
are because this is annoyingly weird.

To help answer the last question by Timothy Chen: on my local system, I 
compared the number of directories (/proc/<PID>) to the number that have a 
cgroup in them and I consistently see that they are off by one (E.g. one PID 
has no cgroup).  To me, this looks like when everything is running, the cgroup 
does exist for the PID.

> failed to determine cgroup for the 'cpu' subsystem
> --------------------------------------------------
>
>                 Key: MESOS-1837
>                 URL: https://issues.apache.org/jira/browse/MESOS-1837
>             Project: Mesos
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.20.1
>         Environment: Ubuntu 14.04
>            Reporter: Chris Fortier
>
> Attempting to launch Docker container with Marathon. Container is launched 
> then fails. 
> A search of /var/log/syslog reveals:
> Sep 27 03:01:43 vagrant-ubuntu-trusty-64 mesos-slave[1409]: E0927 
> 03:01:43.546957  1463 slave.cpp:2205] Failed to update resources for 
> container 8c2429d9-f090-4443-8108-0206ca37f3fd of executor 
> hello-world.970dbe74-45f2-11e4-8b1d-56847afe9799 running task 
> hello-world.970dbe74-45f2-11e4-8b1d-56847afe9799 on status update for 
> terminal task, destroying container: Failed to determine cgroup for the 'cpu' 
> subsystem: Failed to read /proc/9792/cgroup: Failed to open file 
> '/proc/9792/cgroup': No such file or directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1837) failed to determine cgroup for the 'cpu' subsystem

Reply via email to