A. Dukhovniy created MESOS-8051:
-----------------------------------
Summary: Killing TASK_GROUP fails to kill some tasks
Key: MESOS-8051
URL: https://issues.apache.org/jira/browse/MESOS-8051
Project: Mesos
Issue Type: Bug
Components: agent, executor
Affects Versions: 1.4.0
Reporter: A. Dukhovniy
Priority: Critical
When starting following pod definition via marathon:
{code:java}
{
"id": "/simple-pod",
"scaling": {
"kind": "fixed",
"instances": 3
},
"environment": {
"PING": "PONG"
},
"containers": [
{
"name": "ct1",
"resources": {
"cpus": 0.1,
"mem": 32
},
"image": {
"kind": "MESOS",
"id": "busybox"
},
"exec": {
"command": {
"shell": "while true; do echo the current time is $(date) >
./test-v1/clock; sleep 1; done"
}
},
"volumeMounts": [
{
"name": "v1",
"mountPath": "test-v1"
}
]
},
{
"name": "ct2",
"resources": {
"cpus": 0.1,
"mem": 32
},
"exec": {
"command": {
"shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 1;
done"
}
},
"volumeMounts": [
{
"name": "v1",
"mountPath": "etc"
},
{
"name": "v2",
"mountPath": "docker"
}
]
}
],
"networks": [
{
"mode": "host"
}
],
"volumes": [
{
"name": "v1"
},
{
"name": "v2",
"host": "/var/lib/docker"
}
]
}
{code}
mesos will successfully kill all {{ct2}} containers but fail to kill all/some
of the {{ct1}} containers. I've attached both master and agent logs. The
interesting part starts after marathon issues 6 kills:
{code:java}
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.209966 4746 master.cpp:5297] Processing KILL call for task
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.210033 4746 master.cpp:5371] Telling agent
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]
.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.210471 4748 master.cpp:5297] Processing KILL call for task
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.210518 4748 master.cpp:5371] Telling agent
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]
.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.210602 4748 master.cpp:5297] Processing KILL call for task
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.210639 4748 master.cpp:5371] Telling agent
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]
.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.210932 4753 master.cpp:5297] Processing KILL call for task
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.210968 4753 master.cpp:5371] Telling agent
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct2 of framework
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]
.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.211210 4747 master.cpp:5297] Processing KILL call for task
'simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.211251 4747 master.cpp:5371] Telling agent
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task
simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853dbf20.ct1 of framework
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]
.229:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.211474 4746 master.cpp:5297] Processing KILL call for task
'simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]:
I1004 14:58:25.211514 4746 master.cpp:5371] Telling agent
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task
simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853dbf20.ct2 of framework
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
[email protected]
.229:15101
{code}
All {{*.ct1}} tasks FAILED where {{*.ct2}} were successfully killed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)