Matthias Veit created MESOS-3766:
------------------------------------
Summary: Can not kill task in Status STAGING
Key: MESOS-3766
URL: https://issues.apache.org/jira/browse/MESOS-3766
Project: Mesos
Issue Type: Bug
Components: general
Affects Versions: 0.25.0
Environment: OSX
Reporter: Matthias Veit
I have created a simple Marathon Application with instance count 100 (100
tasks) with a simple sleep command. Before all tasks were running, I killed all
tasks. This operation was successful, except 2 tasks. These 2 tasks are in
state STAGING (according to the mesos UI). Marathon tries to kill those tasks
every 5 seconds (for over an hour now) - unsuccessfully.
I picked one task and grepped the slave log:
{noformat}
I1020 12:39:38.480478 315482112 slave.cpp:1270] Got assigned task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:39:38.887559 315482112 slave.cpp:1386] Launching task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d for framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:39:38.898221 315482112 slave.cpp:4852] Launching executor
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 with resour
I1020 12:39:38.899521 315482112 slave.cpp:1604] Queuing task
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' for executor
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework '80
I1020 12:39:39.740401 313872384 containerizer.cpp:640] Starting container
'5ce75a17-12db-4c8f-9131-b40f8280b9f7' for executor
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of fr
I1020 12:39:40.495931 313872384 containerizer.cpp:873] Checkpointing executor's
forked pid 37096 to
'/tmp/mesos/meta/slaves/80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0/frameworks
I1020 12:39:41.744439 313335808 slave.cpp:2379] Got registration for executor
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-000
I1020 12:39:42.080734 313335808 slave.cpp:1760] Sending queued task
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' to executor
'app.dc98434b-7716-11e5-a5fc-1ea69edef42d' of frame
I1020 12:40:13.073390 312262656 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:18.079651 312262656 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:23.097504 313335808 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:28.118443 313872384 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:33.138137 313335808 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:38.158529 316018688 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:43.177901 314408960 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:48.197852 313872384 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:53.216672 316018688 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:40:58.238471 314945536 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:41:03.256614 312799232 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:41:08.276450 313335808 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:41:13.297114 315482112 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:41:18.316463 316018688 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
I1020 12:41:23.337116 313872384 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
.
.
.
I1020 14:11:03.614157 316018688 slave.cpp:1789] Asked to kill task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000
{noformat}
master log looks like this:
{noformat}
I1020 12:39:38.044208 351387648 master.hpp:176] Adding task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d with resources cpus(*):0.1; mem(*):16;
ports(*):[31232-31232] on slave 80
I1020 12:39:38.044494 351387648 master.cpp:3248] Launching task
app.dc98434b-7716-11e5-a5fc-1ea69edef42d of framework
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-0000 (marathon) at
I1020 12:40:13.061883 350314496 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
I1020 12:40:18.079074 351387648 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
I1020 12:40:23.097110 352460800 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
I1020 12:40:28.117952 352997376 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
I1020 12:40:33.137667 352460800 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
I1020 12:40:38.157832 354070528 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
I1020 12:40:43.177223 353533952 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
.
.
.
I1020 14:11:33.611827 353533952 master.cpp:3482] Telling slave
80ba2050-bf0f-4472-a2f7-2636c4f7b8c8-S0 at slave(1)@127.0.0.1:5051 (localhost)
to kill task app.dc98434b-7716-1
{noformat}
The sandbox: stdout is empty and stderr has following content:
{noformat}
I1020 12:39:41.551882 2047558400 exec.cpp:134] Version: 0.25.0
{noformat}
Just for reference, this was the Marathon Application used:
{noformat}
{
"id": "/app",
"mem": 16.0,
"cmd": "sleep 10000",
"cpus": 0.1,
"disk": 0.0,
"env": {
"foo": "bla"
}
}
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)