[ https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034701#comment-15034701 ]
Jojy Varghese commented on MESOS-4025: -------------------------------------- On debian8: {code} [ RUN ] SlaveRecoveryTest/0.Reboot I1201 21:57:11.562711 7964 exec.cpp:136] Version: 0.26.0 I1201 21:57:11.571506 7978 exec.cpp:210] Executor registered on slave 00a179f0-f087-4054-a0c7-c15281d5e7ff-S0 Registered executor on debian8 Starting task 791255fc-88dd-452e-ba12-6b2dfced99a0 Forked command at 7987 sh -c 'sleep 1000' I1201 21:57:11.640627 7982 exec.cpp:383] Executor asked to shutdown Shutting down Sending SIGTERM to process tree at pid 7987 Killing the following process trees: [ -+- 7987 sh -c sleep 1000 \--- 7988 sleep 1000 ] Command terminated with signal Terminated (pid: 7987) [ OK ] SlaveRecoveryTest/0.Reboot (1730 ms) [ RUN ] SlaveRecoveryTest/0.GCExecutor 2015-12-01 21:57:13,187:1473(0x7f9bf4e36700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:44262] zk retcode=-4, errno=111(Connection refused): server refused to accept the client I1201 21:57:13.296581 8012 exec.cpp:136] Version: 0.26.0 I1201 21:57:13.305498 8028 exec.cpp:210] Executor registered on slave 44a46bd2-d24a-48d6-bd62-492c15845841-S0 Registered executor on debian8 Starting task 8affc624-c95d-43f5-a2b9-967663c3151b sh -c 'sleep 1000' Forked command at 8035 ../../src/tests/mesos.cpp:781: Failure (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave': Device or resource busy *** Aborted at 1449007033 (unix time) try "date -d @1449007033" if you are using GNU date *** PC: @ 0x14b079e testing::UnitTest::AddTestPartResult() *** SIGSEGV (@0x0) received by PID 1473 (TID 0x7f9c3db5d7c0) from PID 0; stack trace: *** @ 0x7f9c28c2166c os::Linux::chained_handler() @ 0x7f9c28c25a0a JVM_handle_linux_signal @ 0x7f9c374728d0 (unknown) @ 0x14b079e testing::UnitTest::AddTestPartResult() @ 0x14a51d7 testing::internal::AssertHelper::operator=() @ 0xf564c1 mesos::internal::tests::ContainerizerTest<>::TearDown() @ 0x14ce2c0 testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x14c9238 testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x14aa5c0 testing::Test::Run() @ 0x14aad05 testing::TestInfo::Run() @ 0x14ab340 testing::TestCase::Run() @ 0x14b1c8f testing::internal::UnitTestImpl::RunAllTests() @ 0x14cef4f testing::internal::HandleSehExceptionsInMethodIfSupported<>() @ 0x14c9d8e testing::internal::HandleExceptionsInMethodIfSupported<>() @ 0x14b09bf testing::UnitTest::Run() @ 0xd63df2 RUN_ALL_TESTS() @ 0xd639d0 main @ 0x7f9c370dbb45 (unknown) @ 0x9588e9 (unknown) {code} * The crash was inside *ContainerizerTest<slave::MesosContainerizer>::TearDown*. * The assertion *AWAIT_READY(cgroups::destroy(hierarchy, cgroup));* failed. The cgroup in question was */sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave* as seen from the log above. > SlaveRecoveryTest/0.GCExecutor is flaky. > ---------------------------------------- > > Key: MESOS-4025 > URL: https://issues.apache.org/jira/browse/MESOS-4025 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.26.0 > Reporter: Till Toenshoff > Labels: test > > Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based > on 0.26.0-rc1. > Testsuite was run as root. > {noformat} > sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1 > {noformat} > {noformat} > [ RUN ] SlaveRecoveryTest/0.GCExecutor > I1130 16:49:16.336833 1032 exec.cpp:136] Version: 0.26.0 > I1130 16:49:16.345212 1049 exec.cpp:210] Executor registered on slave > dde9fd4e-b016-4a99-9081-b047e9df9afa-S0 > Registered executor on ubuntu14 > Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114 > sh -c 'sleep 1000' > Forked command at 1057 > ../../src/tests/mesos.cpp:779: Failure > (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup > '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': > Device or resource busy > *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are > using GNU date *** > PC: @ 0x1443e9a testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; > stack trace: *** > @ 0x7f1be92b80b7 os::Linux::chained_handler() > @ 0x7f1be92bc219 JVM_handle_linux_signal > @ 0x7f1bf7bbc340 (unknown) > @ 0x1443e9a testing::UnitTest::AddTestPartResult() > @ 0x1438b99 testing::internal::AssertHelper::operator=() > @ 0xf0b3bb > mesos::internal::tests::ContainerizerTest<>::TearDown() > @ 0x1461882 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145c6f8 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x143de4a testing::Test::Run() > @ 0x143e584 testing::TestInfo::Run() > @ 0x143ebca testing::TestCase::Run() > @ 0x1445312 testing::internal::UnitTestImpl::RunAllTests() > @ 0x14624a7 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x145d26e > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x14440ae testing::UnitTest::Run() > @ 0xd15cd4 RUN_ALL_TESTS() > @ 0xd158c1 main > @ 0x7f1bf7808ec5 (unknown) > @ 0x913009 (unknown) > {noformat} > My Vagrantfile generator; > {noformat} > #!/usr/bin/env bash > cat << EOF > Vagrantfile > # -*- mode: ruby -*-" > > # vi: set ft=ruby : > Vagrant.configure(2) do |config| > # Disable shared folder to prevent certain kernel module dependencies. > config.vm.synced_folder ".", "/vagrant", disabled: true > config.vm.box = "bento/ubuntu-14.04" > config.vm.hostname = "${PLATFORM_NAME}" > config.vm.provider "virtualbox" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > vb.customize ["modifyvm", :id, "--nictype1", "virtio"] > vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] > vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"] > end > config.vm.provider "vmware_fusion" do |vb| > vb.memory = ${VAGRANT_MEM} > vb.cpus = ${VAGRANT_CPUS} > end > config.vm.provision "file", source: "../test.sh", destination: "~/test.sh" > config.vm.provision "shell", inline: <<-SHELL > sudo apt-get update > sudo apt-get -y install openjdk-7-jdk autoconf libtool > sudo apt-get -y install build-essential python-dev python-boto \ > libcurl4-nss-dev libsasl2-dev maven \ > libapr1-dev libsvn-dev libssl-dev libevent-dev > sudo apt-get -y install git > sudo wget -qO- https://get.docker.com/ | sh > SHELL > end > EOF > {noformat} > The problem is kicking in frequently in my tests - I'ld say > 10% but less > than 50%. -- This message was sent by Atlassian JIRA (v6.3.4#6332)