Jay Buffington created MESOS-1824:
-------------------------------------

             Summary: when "docker ps -a" returns 400+ lines enabling docker 
containerizer results in all executors dying
                 Key: MESOS-1824
                 URL: https://issues.apache.org/jira/browse/MESOS-1824
             Project: Mesos
          Issue Type: Bug
          Components: containerization
            Reporter: Jay Buffington


To reproduce:

# run this one-liner on your slave to create 400 exited docker containers:
{noformat}
for i in `seq 1 400`; do docker run busybox:latest echo "hello" ; done;
{noformat}
# Start mesos-slave with only mesos containerizer enabled
# Launch tasks that use an executor (which uses libmesos)
# Restart mesos-slave process with --containerizer=docker,mesos
# See mesos-slave fork "docker ps -a" and never return
# Note that this mesos-slave never reregisters with master
# Wait at least 10 minutes and see executors commit suicide, which kills all of 
the tasks on your system.  From executor log:
{noformat}
I0919 21:24:14.018127 21778 exec.cpp:379] Executor asked to shutdown
I0919 21:24:14.018812 21771 exec.cpp:78] Scheduling shutdown of the executor
I0919 21:24:14.020514 21778 exec.cpp:394] Executor::shutdown took 1.866382ms
I0919 21:24:16.000500 21771 exec.cpp:525] Executor sending status update 
TASK_KILLED (UUID: bfd3969c-ad0a-455a-93fe-06c37bdee513) for task 
1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 
20140814-221057-1208029356-5050-10525-0000
I0919 21:24:16.030253 21772 exec.cpp:332] Ignoring status update 
acknowledgement bfd3969c-ad0a-455a-93fe-06c37bdee513 for task 
1411160025479-another-task-0-b5e24381-3353-43d4-9587-ffef9ccf2f38 of framework 
20140814-221057-1208029356-5050-10525-0000 because the driver is aborted!
I0919 21:24:19.021966 21778 exec.cpp:86] Committing suicide by killing the 
process group
{noformat}
# mesos-slave fails to tell the master about tasking be killed with this 
message in the log:

{noformat}
W0918 01:02:57.252231 11725 status_update_manager.cpp:381] Not
forwarding status update TASK_KILLED (UUID:
6fbacbcf-ad0f-4e89-89ee-e9f88a618573) for task
1410298578043-some-task-30-29279377-fdf2-4bb7-b862-852adddea09c
of framework 20140522-213145-1749004561-5050-29512-0000 because no
master is elected yet
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to