Benjamin Mahler created MESOS-1154:
--------------------------------------

             Summary: Flaky SlaveRecoveryTest test: ReconcileKillTask and 
RecoverStatusUpdateManager.
                 Key: MESOS-1154
                 URL: https://issues.apache.org/jira/browse/MESOS-1154
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler
            Assignee: Ian Downes


Looks like the test tear down is failing to remove a cgroup in both cases:

{noformat: title=SlaveRecoveryTest/0.ReconcileKillTask}
[ RUN      ] SlaveRecoveryTest/0.ReconcileKillTask
2014-03-27 
22:32:49,330:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 22:32:50.850909 53927 exec.cpp:131] Version: 0.19.0
I0327 22:32:50.853888 53953 exec.cpp:205] Executor registered on slave 
20140327-223247-1740121354-49087-44864-0
Registered executor on smfd-bkq-03-sr4.devel.twitter.com
Starting task bc4f5f79-088e-4188-b9b5-3585ba5e6a98
sh -c 'sleep 1000'
Forked command at 53967
2014-03-27 
22:32:52,667:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
2014-03-27 
22:32:56,004:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I0327 22:32:56.032625 53957 exec.cpp:251] Received reconnect request from slave 
20140327-223247-1740121354-49087-44864-0
I0327 22:32:56.033253 53945 exec.cpp:228] Executor re-registered on slave 
20140327-223247-1740121354-49087-44864-0
Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
Shutting down
Sending SIGTERM to process tree at pid 53967
Killing the following process trees:
[
--- 53967 sleep 1000
]
Command terminated with signal Terminated (pid: 53967)
2014-03-27 
22:32:59,341:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
../../src/tests/mesos.cpp:387: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): 
'mesos_test_cd6a76e9-9961-40ba-b0ed-b1190954975f/0b1f90b8-b5a2-43e9-851e-93ca56cb37d0'
 is not a valid cgroup
[  FAILED  ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam = 
mesos::internal::slave::MesosContainerizer (13600 ms)
{noformat}

{noformat: title=SlaveRecoveryTest/0.RecoverStatusUpdateManager}
[ RUN      ] SlaveRecoveryTest/0.RecoverStatusUpdateManager
2014-03-27 
22:30:12,509:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
2014-03-27 
22:30:15,845:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0327 22:30:18.038733 52597 exec.cpp:131] Version: 0.19.0
I0327 22:30:18.043248 52620 exec.cpp:205] Executor registered on slave 
20140327-223012-1740121354-49087-44864-0
Registered executor on smfd-bkq-03-sr4.devel.twitter.com
Starting task 4c8dda64-17e8-4390-8f31-f878cdde8228
sh -c 'sleep 1000'
Forked command at 52637
2014-03-27 
22:30:19,182:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
2014-03-27 
22:30:22,519:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I0327 22:30:24.087932 52634 exec.cpp:251] Received reconnect request from slave 
20140327-223012-1740121354-49087-44864-0
I0327 22:30:24.088851 52635 exec.cpp:228] Executor re-registered on slave 
20140327-223012-1740121354-49087-44864-0
Re-registered executor on smfd-bkq-03-sr4.devel.twitter.com
2014-03-27 
22:30:25,855:44864(0x7fa2c5bab940):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:60875] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
I0327 22:30:26.091655 52614 exec.cpp:378] Executor asked to shutdown
Shutting down
Sending SIGTERM to process tree at pid 52637
Killing the following process trees:
[
--- 52637 sleep 1000
]
Command terminated with signal Terminated (pid: 52637)
../../src/tests/mesos.cpp:387: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to kill tasks in nested 
cgroups: Collect failed: 
'mesos_test_87c28b05-8055-407a-8ef5-eda7febd1f1c/302e57d9-e837-46cd-bd81-0e39c5b19564'
 is not a valid cgroup
[  FAILED  ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam = 
mesos::internal::slave::MesosContainerizer (16304 ms)
{noformat}

Seems to be flaky and only occurring sometimes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to