[
https://issues.apache.org/jira/browse/MESOS-8051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
A. Dukhovniy updated MESOS-8051:
--------------------------------
Attachment: dcos-mesos-master.log.gz
dcos-mesos-slave.log.gz
Master and agent logs
> Killing TASK_GROUP fails to kill some tasks
> -------------------------------------------
>
> Key: MESOS-8051
> URL: https://issues.apache.org/jira/browse/MESOS-8051
> Project: Mesos
> Issue Type: Bug
> Components: agent, executor
> Affects Versions: 1.4.0
> Reporter: A. Dukhovniy
> Priority: Critical
> Attachments: dcos-mesos-master.log.gz, dcos-mesos-slave.log.gz
>
>
> When starting following pod definition via marathon:
> {code:java}
> {
> "id": "/simple-pod",
> "scaling": {
> "kind": "fixed",
> "instances": 3
> },
> "environment": {
> "PING": "PONG"
> },
> "containers": [
> {
> "name": "ct1",
> "resources": {
> "cpus": 0.1,
> "mem": 32
> },
> "image": {
> "kind": "MESOS",
> "id": "busybox"
> },
> "exec": {
> "command": {
> "shell": "while true; do echo the current time is $(date) >
> ./test-v1/clock; sleep 1; done"
> }
> },
> "volumeMounts": [
> {
> "name": "v1",
> "mountPath": "test-v1"
> }
> ]
> },
> {
> "name": "ct2",
> "resources": {
> "cpus": 0.1,
> "mem": 32
> },
> "exec": {
> "command": {
> "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep
> 1; done"
> }
> },
> "volumeMounts": [
> {
> "name": "v1",
> "mountPath": "etc"
> },
> {
> "name": "v2",
> "mountPath": "docker"
> }
> ]
> }
> ],
> "networks": [
> {
> "mode": "host"
> }
> ],
> "volumes": [
> {
> "name": "v1"
> },
> {
> "name": "v2",
> "host": "/var/lib/docker"
> }
> ]
> }
> {code}
> mesos will successfully kill all {{ct2}} containers but fail to kill all/some
> of the {{ct1}} containers. I've attached both master and agent logs. The
> interesting part starts after marathon issues 6 kills:
> {code:java}
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.209966 4746 master.cpp:5297] Processing
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon)
> at [email protected]:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.210033 4746 master.cpp:5371] Telling
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
> [email protected]
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.210471 4748 master.cpp:5297] Processing
> KILL call for task 'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon)
> at [email protected]:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.210518 4748 master.cpp:5371] Telling
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task
> simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
> [email protected]
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.210602 4748 master.cpp:5297] Processing
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon)
> at [email protected]:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.210639 4748 master.cpp:5371] Telling
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
> [email protected]
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.210932 4753 master.cpp:5297] Processing
> KILL call for task 'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon)
> at [email protected]:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.210968 4753 master.cpp:5371] Telling
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task
> simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct2 of framework
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
> [email protected]
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.211210 4747 master.cpp:5297] Processing
> KILL call for task 'simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853d
> bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon)
> at [email protected]:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.211251 4747 master.cpp:5371] Telling
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task
> simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853dbf20.ct1 of framework
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
> [email protected]
> .229:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.211474 4746 master.cpp:5297] Processing
> KILL call for task 'simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853d
> bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon)
> at [email protected]:15101
> Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal
> mesos-master[4708]: I1004 14:58:25.211514 4746 master.cpp:5371] Telling
> agent bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
> 10.0.1.207) to kill task
> simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853dbf20.ct2 of framework
> bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at
> [email protected]
> .229:15101
> {code}
> All {{.ct1}} tasks FAILED where {{.ct2}} were successfully killed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)