hey Jojy,  Thanks for your reply.  Response inline.

On Thu, Dec 31, 2015 at 11:31 AM, Jojy Varghese <[email protected]> wrote:

> > Are /foo/bar cgroups hierarchical such that /foo missing would prevent
> >   /foo/bar tasks from being scheduled?  i.e., might that be the root
> cause of
> >   why the kernel is ignoring these tasks?
>
> Was curious why you said the above. CPU scheduling shares are a function
> of their parent’s CPU bandwidth.
>

This question arose from an earlier observation in my initial email:

In my initial email I pointed out that the contents of /proc/sched_debug
list all of the CFS run queues, but it seems like some of those run queues
are missing on the affected hosts.  i.e., usually they look like this (only
including output for the 1st CPU's CFS run queues):

% grep 'cfs_rq\[0\]' /proc/sched_debug
cfs_rq[0]:/mesos/e8aa3b46-8004-466a-9a5e-249d6d19993f
cfs_rq[0]:/mesos
cfs_rq[0]:/

But on the problematic hosts, they look like this:

% grep 'cfs_rq\[0\]' /proc/sched_debug
cfs_rq[0]:/mesos/5cf9a444-e612-4d5b-b8bb-7ee93e44b352
cfs_rq[0]:/

Notably, "cfs_rq[0]:/mesos" is missing on the problematic hosts.

I'm not sure how that is possible, given my understanding that these
cfs_rq's are created from the special cgroups filesystem having directories
added to it, and since the /cgroup/cpu/mesos dir exists (as well as
/cgroup/cpu/mesos/5cf9a444-e612-4d5b-b8bb-7ee93e44b352/), I don't see how
the CFS run queues for "/mesos" could have been deleted.   I've been trying
to read the kernel cgroup CFS scheduling code, but it's tough for a newb.

Notably, the cgroup settings that I see in /cgroup/cpu/mesos and
/cgroup/cpu/mesos/5cf9a444-e612-4d5b-b8bb-7ee93e44b352 are not suspicious.
i.e., it's not that the cgroup settings of the "parent" /mesos cgroup are
preventing the tasks from being scheduled.  It seems to be that the cgroup
settings of the parent are simply gone from the kernel.  Poof.

At this point I'm assuming that the above observation is indeed the root
cause of the problem, and I'm simply hoping that whatever logic deleted the
"/mesos" run queue is fixed in either a newer kernel or newer mesos version.

Thanks!

- Erik



>
> -Jojy
>
>
> > On Dec 30, 2015, at 6:55 PM, Erik Weathers <[email protected]>
> wrote:
> >
> > I'm trying to figure out a situation where we see tasks in a mesos
> > container no longer being scheduled by the Linux kernel.  None of the
> tasks
> > in the container are zombies, nor are they stuck in "Disk sleep" state.
> > They are all in Running state.  But if I try to strace the processes the
> > strace cmd just hangs.  I've also noticed that none of the RIPs (64-bit
> > instruction pointers) are changing at all in these tasks, and they're not
> > accumulating any cputime.   So the kernel is just not scheduling them.
> >
> > Despite the behavior described above, these non-running tasks *are*
> listed
> > in the run queues of /proc/sched_debug.  Notably, I have observed that on
> > hosts without this problem that there exist "cfs_rq[N]:/mesos" run
> queues,
> > but on the hosts that have the broken scheduling, these run queues don't
> > exist, though we still have "cfs_rq[N]:/mesos/<cgroup-UUID>" in
> > /proc/sched_debug.  That is mighty suspicious to me.
> >
> > I'm curious about:
> >
> >   - Has anyone seen similar behavior?
> >   - Are /foo/bar cgroups hierarchical such that /foo missing would
> prevent
> >   /foo/bar tasks from being scheduled?  i.e., might that be the root
> cause of
> >   why the kernel is ignoring these tasks?
> >   - What creates the /mesos cfs run queue, and why would that cease to
> >   exist without the subordinate cgroups being cleaned up?
> >      - I'm assuming the creation of the "cpu" cgroup with the path
> >      "/mesos" done by mesos-slave creates this run queue.
> >      - But I'm not sure how/why it would be removed, since I still see a
> >      mesos cgroup in my cgroupfs cpu path (i.e., /cgroup/cpu/mesos
> exists).
> >
> > I'm assuming that this is a kernel bug, and I'm hopeful RedHat has
> patched
> > fixes into newer kernel versions that we are running on other hosts
> (e.g.,
> > 2.6.32-573.7.1.el6).
> >
> > Setup info:
> >
> > Kernel version:  2.6.32-431.el6.x86_64
> > Mesos version:  0.22.1
> > Containerizer: Mesos
> > Isolators: Have seen this behavior with both of these configs:
> >   cgroups/cpu,cgroups/mem
> >   cgroups/cpu,cgroups/mem,namespaces/pid
> >
> > Thanks for any insight or help!
> >
> > - Erik
>
>

Reply via email to