> On April 30, 2013, 5:33 p.m., Ben Mahler wrote: > > src/tests/cgroups_tests.cpp, lines 162-175 > > <https://reviews.apache.org/r/10808/diff/1/?file=285107#file285107line162> > > > > I wish there was a simpler way to do this. Do you think this is a > > kernel bug? (Note that we haven't encountered this on CentOS). > > > > If so, and it occurs on kernels we want to support, is this only an > > issue during the tests, or could there be issues in a production > > environment? > > Thomas Marshall wrote: > I haven't been able to reproduce this on Linux kernel version 2.6, 3.2, > or 3.8, only 3.5, so it does seem to be a kernel bug. Unfortunately, 3.5 > happens to be the kernel version that ships with the latest LTS release of > Ubuntu, which we in the AMP lab were hoping to support along with CentOS 6.4. > > I believe that this is extremely unlikely to come up in production > because it is related to unmounting hierarchies, and the CgroupsIsolator only > mounts one hierarchy for each slave, so unless you killed a slave and then > immediately brought up another one on the same machine, there shouldn't be a > problem (and if you're killing the slave manually, there's no way you'll > launch the new one fast enough). > > As far as other ways to make this work, I looked into the possibility of, > instead of waiting for the cleanup, just retrying the mount some number of > times, but there's no way to really make this work for the NoHierarchy test, > since it doesn't do any mounting in SetUp. > > The simplest alternative would be to add an os::sleep(Seconds(0.1)) to > the end of TearDown, which is a little less hack-y in that I haven't been > able to find any documentation saying that the write is /proc/cgroups is a > logical event to wait on, but it'll slow the tests down, and affects systems > where this bug isn't present. > > Ben Mahler wrote: > Interesting.. taking a step back, is this issue attributed to the fact > that the cgroups tests currently use fixed names? Can we change that?
No, the problem is that subsystems can only be attached to a single hierarchy, so if the hierarchy from a previous test isn't fully unmounted yet trying to mount a new hierarchy will fail, regardless of what the hierarchy or the cgroups are named. Since different tests use different subsystems, we can't just reuse the hierarchy either. - Thomas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10808/#review19961 ----------------------------------------------------------- On May 6, 2013, 10:36 p.m., Thomas Marshall wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10808/ > ----------------------------------------------------------- > > (Updated May 6, 2013, 10:36 p.m.) > > > Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler. > > > Description > ------- > > Previously, our the CgroupsAnyHierarchy*Tests were not waiting long enough > for the cgroups created to be cleaned up before the test finished, resulting > in failures when the next test was unable to create cgroups for itself. This > patch attempts to fix that by having CgroupsAnyHierarchy::TearDown wait until > the number of cgroups each subsystem is attached to, documented in > /proc/cgroups, is the same as it was before the test ran. > > > This addresses bug MESOS-449. > https://issues.apache.org/jira/browse/MESOS-449 > > > Diffs > ----- > > src/linux/cgroups.hpp 3e86828 > src/linux/cgroups.cpp cfdc3b2 > src/tests/cgroups_tests.cpp 3734315 > > Diff: https://reviews.apache.org/r/10808/diff/ > > > Testing > ------- > > On both Ubuntu and CentOS, on both clean installs and in the presence of > pre-existing cgroups on the system (such as those left behind by > BalloonFramework failing): > > bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=1000 > --gtest_filter=*CgroupsAny* > > > Thanks, > > Thomas Marshall > >
