[ 
https://issues.apache.org/jira/browse/MESOS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034701#comment-15034701
 ] 

Jojy Varghese commented on MESOS-4025:
--------------------------------------

On debian8:

{code}
[ RUN      ] SlaveRecoveryTest/0.Reboot
I1201 21:57:11.562711  7964 exec.cpp:136] Version: 0.26.0
I1201 21:57:11.571506  7978 exec.cpp:210] Executor registered on slave 
00a179f0-f087-4054-a0c7-c15281d5e7ff-S0
Registered executor on debian8
Starting task 791255fc-88dd-452e-ba12-6b2dfced99a0
Forked command at 7987
sh -c 'sleep 1000'
I1201 21:57:11.640627  7982 exec.cpp:383] Executor asked to shutdown
Shutting down
Sending SIGTERM to process tree at pid 7987
Killing the following process trees:
[ 
-+- 7987 sh -c sleep 1000 
 \--- 7988 sleep 1000 
]
Command terminated with signal Terminated (pid: 7987)
[       OK ] SlaveRecoveryTest/0.Reboot (1730 ms)
[ RUN      ] SlaveRecoveryTest/0.GCExecutor
2015-12-01 
21:57:13,187:1473(0x7f9bf4e36700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:44262] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I1201 21:57:13.296581  8012 exec.cpp:136] Version: 0.26.0
I1201 21:57:13.305498  8028 exec.cpp:210] Executor registered on slave 
44a46bd2-d24a-48d6-bd62-492c15845841-S0
Registered executor on debian8
Starting task 8affc624-c95d-43f5-a2b9-967663c3151b
sh -c 'sleep 1000'
Forked command at 8035
../../src/tests/mesos.cpp:781: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave': 
Device or resource busy
*** Aborted at 1449007033 (unix time) try "date -d @1449007033" if you are 
using GNU date ***
PC: @          0x14b079e testing::UnitTest::AddTestPartResult()
*** SIGSEGV (@0x0) received by PID 1473 (TID 0x7f9c3db5d7c0) from PID 0; stack 
trace: ***
    @     0x7f9c28c2166c os::Linux::chained_handler()
    @     0x7f9c28c25a0a JVM_handle_linux_signal
    @     0x7f9c374728d0 (unknown)
    @          0x14b079e testing::UnitTest::AddTestPartResult()
    @          0x14a51d7 testing::internal::AssertHelper::operator=()
    @           0xf564c1 mesos::internal::tests::ContainerizerTest<>::TearDown()
    @          0x14ce2c0 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
    @          0x14c9238 
testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x14aa5c0 testing::Test::Run()
    @          0x14aad05 testing::TestInfo::Run()
    @          0x14ab340 testing::TestCase::Run()
    @          0x14b1c8f testing::internal::UnitTestImpl::RunAllTests()
    @          0x14cef4f 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
    @          0x14c9d8e 
testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x14b09bf testing::UnitTest::Run()
    @           0xd63df2 RUN_ALL_TESTS()
    @           0xd639d0 main
    @     0x7f9c370dbb45 (unknown)
    @           0x9588e9 (unknown)

{code}

* The crash was inside 
*ContainerizerTest<slave::MesosContainerizer>::TearDown*. 
* The assertion *AWAIT_READY(cgroups::destroy(hierarchy, cgroup));* failed. The 
cgroup in question was 
*/sys/fs/cgroup/memory/mesos_test_a894bd47-5e1a-4442-bc6b-303d2aed6945/slave* 
as seen from the log above.

> SlaveRecoveryTest/0.GCExecutor is flaky.
> ----------------------------------------
>
>                 Key: MESOS-4025
>                 URL: https://issues.apache.org/jira/browse/MESOS-4025
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.26.0
>            Reporter: Till Toenshoff
>              Labels: test
>
> Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based 
> on 0.26.0-rc1.
> Testsuite was run as root.
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
> {noformat}
> {noformat}
> [ RUN      ] SlaveRecoveryTest/0.GCExecutor
> I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
> I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave 
> dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
> Registered executor on ubuntu14
> Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
> sh -c 'sleep 1000'
> Forked command at 1057
> ../../src/tests/mesos.cpp:779: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave':
>  Device or resource busy
> *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are 
> using GNU date ***
> PC: @          0x1443e9a testing::UnitTest::AddTestPartResult()
> *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; 
> stack trace: ***
>     @     0x7f1be92b80b7 os::Linux::chained_handler()
>     @     0x7f1be92bc219 JVM_handle_linux_signal
>     @     0x7f1bf7bbc340 (unknown)
>     @          0x1443e9a testing::UnitTest::AddTestPartResult()
>     @          0x1438b99 testing::internal::AssertHelper::operator=()
>     @           0xf0b3bb 
> mesos::internal::tests::ContainerizerTest<>::TearDown()
>     @          0x1461882 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x145c6f8 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x143de4a testing::Test::Run()
>     @          0x143e584 testing::TestInfo::Run()
>     @          0x143ebca testing::TestCase::Run()
>     @          0x1445312 testing::internal::UnitTestImpl::RunAllTests()
>     @          0x14624a7 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x145d26e 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x14440ae testing::UnitTest::Run()
>     @           0xd15cd4 RUN_ALL_TESTS()
>     @           0xd158c1 main
>     @     0x7f1bf7808ec5 (unknown)
>     @           0x913009 (unknown)
> {noformat}
> My Vagrantfile generator;
> {noformat}
> #!/usr/bin/env bash
> cat << EOF > Vagrantfile
> # -*- mode: ruby -*-" >
> # vi: set ft=ruby :
> Vagrant.configure(2) do |config|
>   # Disable shared folder to prevent certain kernel module dependencies.
>   config.vm.synced_folder ".", "/vagrant", disabled: true
>   config.vm.box = "bento/ubuntu-14.04"
>   config.vm.hostname = "${PLATFORM_NAME}"
>   config.vm.provider "virtualbox" do |vb|
>     vb.memory = ${VAGRANT_MEM}
>     vb.cpus = ${VAGRANT_CPUS}
>     vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
>     vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
>     vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
>   end
>   config.vm.provider "vmware_fusion" do |vb|
>     vb.memory = ${VAGRANT_MEM}
>     vb.cpus = ${VAGRANT_CPUS}
>   end
>   config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
>   config.vm.provision "shell", inline: <<-SHELL
>     sudo apt-get update
>     sudo apt-get -y install openjdk-7-jdk autoconf libtool
>     sudo apt-get -y install build-essential python-dev python-boto          \
>                             libcurl4-nss-dev libsasl2-dev maven             \
>                             libapr1-dev libsvn-dev libssl-dev libevent-dev
>     sudo apt-get -y install git
>     sudo wget -qO- https://get.docker.com/ | sh
>   SHELL
> end
> EOF
> {noformat}
> The problem is kicking in frequently in my tests - I'ld say > 10% but less 
> than 50%.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to