Benjamin Mahler <[email protected]> writes:

> That test is broken on master currently, the ticket is here:
> MESOS-487<https://issues.apache.org/jira/browse/MESOS-487>

And the fix for the broken test is in:
https://reviews.apache.org/r/13034/

Kevin
Your first run of the tests is current expected until that test is fixed.

With respect to subsequent runs I have seen that before simply with
mounting and unmounting cgroupfs.  There are weird races in play and
weird checks going on, and the unit tests exercise the kernel bugs quite
well.

You can look at /proc/cgroups and /proc/<pid>/cgroups to have some idea
of what is going on.

For myself when I do not wind up with unkillable processes or orphan
processes I only had to wait a while.  Possibly coupled with echo 3 >
/proc/sys/vm/drop_caches and it was possible to mount cgroup filesystems
again.

I intend to look into these kernel bugs soonish but they aren't exactly
deterministic.

mesos-slave in a running configuration instead of a test configuration
leaves cgroupfs mounted so you are not likely to hit these kernel
problems if you actually start running mesos.

Do becareful about running with a fixed balloon test though.  With an
unfixed kernel and system with swap enabled it creates effectively
unkillable processes for me.

If you are a curious you can find more about how the tests are failing
by running them with MESOS_VERBOSE=1 make check.

Eric

Reply via email to