I'm trying to figure out a situation where we see tasks in a mesos
container no longer being scheduled by the Linux kernel.  None of the tasks
in the container are zombies, nor are they stuck in "Disk sleep" state.
They are all in Running state.  But if I try to strace the processes the
strace cmd just hangs.  I've also noticed that none of the RIPs (64-bit
instruction pointers) are changing at all in these tasks, and they're not
accumulating any cputime.   So the kernel is just not scheduling them.

Despite the behavior described above, these non-running tasks *are* listed
in the run queues of /proc/sched_debug.  Notably, I have observed that on
hosts without this problem that there exist "cfs_rq[N]:/mesos" run queues,
but on the hosts that have the broken scheduling, these run queues don't
exist, though we still have "cfs_rq[N]:/mesos/<cgroup-UUID>" in
/proc/sched_debug.  That is mighty suspicious to me.

I'm curious about:

   - Has anyone seen similar behavior?
   - Are /foo/bar cgroups hierarchical such that /foo missing would prevent
   /foo/bar tasks from being scheduled?  i.e., might that be the root cause of
   why the kernel is ignoring these tasks?
   - What creates the /mesos cfs run queue, and why would that cease to
   exist without the subordinate cgroups being cleaned up?
      - I'm assuming the creation of the "cpu" cgroup with the path
      "/mesos" done by mesos-slave creates this run queue.
      - But I'm not sure how/why it would be removed, since I still see a
      mesos cgroup in my cgroupfs cpu path (i.e., /cgroup/cpu/mesos exists).

I'm assuming that this is a kernel bug, and I'm hopeful RedHat has patched
fixes into newer kernel versions that we are running on other hosts (e.g.,
2.6.32-573.7.1.el6).

Setup info:

Kernel version:  2.6.32-431.el6.x86_64
Mesos version:  0.22.1
Containerizer: Mesos
Isolators: Have seen this behavior with both of these configs:
   cgroups/cpu,cgroups/mem
   cgroups/cpu,cgroups/mem,namespaces/pid

Thanks for any insight or help!

- Erik

Reply via email to