Benjamin Mahler created MESOS-1154:
--------------------------------------
Summary: Flaky SlaveRecoveryTest test: ReconcileKillTask and
RecoverStatusUpdateManager.
Key: MESOS-1154
URL: https://issues.apache.org/jira/browse/MESOS-1154
Project: Mesos
Issue Type: Bug
Reporter: Benjamin Mahler
Assignee: Ian Downes
Looks like the test tear down is failing to remove a cgroup in both cases:
{noformat: title=SlaveRecoveryTest/0.ReconcileKillTask}
[ RUN ] SlaveRecoveryTest/0.ReconcileKillTask
2014-03-27
22:32:49,330:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 22:32:50.850909 53927 exec.cpp:131] Version: 0.19.0
I0327 22:32:50.853888 53953 exec.cpp:205] Executor registered on slave
20140327-223247-1740121354-49087-44864-0
Registered executor on smfd-bkq-03-sr4.devel.twitter.com
Starting task bc4f5f79-088e-4188-b9b5-3585ba5e6a98
sh -c 'sleep 1000'
Forked command at 53967
2014-03-27
22:32:52,667:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-03-27
22:32:56,004:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
I0327 22:32:56.032625 53957 exec.cpp:251] Received reconnect request from slave
20140327-223247-1740121354-49087-44864-0
I0327 22:32:56.033253 53945 exec.cpp:228] Executor re-registered on slave
20140327-223247-1740121354-49087-44864-0
Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
Shutting down
Sending SIGTERM to process tree at pid 53967
Killing the following process trees:
[
--- 53967 sleep 1000
]
Command terminated with signal Terminated (pid: 53967)
2014-03-27
22:32:59,341:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
../../src/tests/mesos.cpp:387: Failure
(cgroups::destroy(hierarchy, cgroup)).failure():
'mesos_test_cd6a76e9-9961-40ba-b0ed-b1190954975f/0b1f90b8-b5a2-43e9-851e-93ca56cb37d0'
is not a valid cgroup
[ FAILED ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam =
mesos::internal::slave::MesosContainerizer (13600 ms)
{noformat}
{noformat: title=SlaveRecoveryTest/0.RecoverStatusUpdateManager}
[ RUN ] SlaveRecoveryTest/0.RecoverStatusUpdateManager
2014-03-27
22:30:12,509:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-03-27
22:30:15,845:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 22:30:18.038733 52597 exec.cpp:131] Version: 0.19.0
I0327 22:30:18.043248 52620 exec.cpp:205] Executor registered on slave
20140327-223012-1740121354-49087-44864-0
Registered executor on smfd-bkq-03-sr4.devel.twitter.com
Starting task 4c8dda64-17e8-4390-8f31-f878cdde8228
sh -c 'sleep 1000'
Forked command at 52637
2014-03-27
22:30:19,182:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
2014-03-27
22:30:22,519:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
I0327 22:30:24.087932 52634 exec.cpp:251] Received reconnect request from slave
20140327-223012-1740121354-49087-44864-0
I0327 22:30:24.088851 52635 exec.cpp:228] Executor re-registered on slave
20140327-223012-1740121354-49087-44864-0
Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
2014-03-27
22:30:25,855:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697:
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server
refused to accept the client
I0327 22:30:26.091655 52614 exec.cpp:378] Executor asked to shutdown
Shutting down
Sending SIGTERM to process tree at pid 52637
Killing the following process trees:
[
--- 52637 sleep 1000
]
Command terminated with signal Terminated (pid: 52637)
../../src/tests/mesos.cpp:387: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to kill tasks in nested
cgroups: Collect failed:
'mesos_test_87c28b05-8055-407a-8ef5-eda7febd1f1c/302e57d9-e837-46cd-bd81-0e39c5b19564'
is not a valid cgroup
[ FAILED ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam =
mesos::internal::slave::MesosContainerizer (16304 ms)
{noformat}
Seems to be flaky and only occurring sometimes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)