A. Dukhovniy created MESOS-8051:
-----------------------------------

             Summary: Killing TASK_GROUP fails to kill some tasks
                 Key: MESOS-8051
                 URL: https://issues.apache.org/jira/browse/MESOS-8051
             Project: Mesos
          Issue Type: Bug
          Components: agent, executor
    Affects Versions: 1.4.0
            Reporter: A. Dukhovniy
            Priority: Critical


When starting following pod definition via marathon:

{code:java}
{
  "id": "/simple-pod",
  "scaling": {
    "kind": "fixed",
    "instances": 3
  },
  "environment": {
    "PING": "PONG"
  },
  "containers": [
    {
      "name": "ct1",
      "resources": {
        "cpus": 0.1,
        "mem": 32
      },
      "image": {
        "kind": "MESOS",
        "id": "busybox"
      },
      "exec": {
        "command": {
          "shell": "while true; do echo the current time is $(date) > 
./test-v1/clock; sleep 1; done"
        }
      },
      "volumeMounts": [
        {
          "name": "v1",
          "mountPath": "test-v1"
        }
      ]
    },
    {
      "name": "ct2",
      "resources": {
        "cpus": 0.1,
        "mem": 32
      },
      "exec": {
        "command": {
          "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 1; 
done"
        }
      },
      "volumeMounts": [
        {
          "name": "v1",
          "mountPath": "etc"
        },
        {
          "name": "v2",
          "mountPath": "docker"
        }
      ]
    }
  ],
  "networks": [
    {
      "mode": "host"
    }
  ],
  "volumes": [
    {
      "name": "v1"
    },
    {
      "name": "v2",
      "host": "/var/lib/docker"
    }
  ]
}
{code}

mesos will successfully kill all {{ct2}} containers but fail to kill all/some 
of the {{ct1}} containers. I've attached both master and agent logs. The 
interesting part starts after marathon issues 6 kills:

{code:java}
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.209966  4746 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210033  4746 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210471  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210518  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c1098e5-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210602  4748 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210639  4748 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210932  4753 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.210968  4753 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-3c0ffca4-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.211210  4747 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853d
bf20.ct1' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.211251  4747 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853dbf20.ct1 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]
.229:15101

Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.211474  4746 master.cpp:5297] Processing KILL call for task 
'simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853d
bf20.ct2' of framework bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]:15101
Oct 04 14:58:25 ip-10-0-5-229.eu-central-1.compute.internal mesos-master[4708]: 
I1004 14:58:25.211514  4746 master.cpp:5371] Telling agent 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-S1 at slave(1)@10.0.1.207:5051 (
10.0.1.207) to kill task 
simple-pod.instance-328cd633-a914-11e7-bcd5-e63c853dbf20.ct2 of framework 
bae11d5d-20c2-4d66-9ec3-773d1d717e58-0001 (marathon) at 
[email protected]
.229:15101
{code}

All {{*.ct1}} tasks FAILED where {{*.ct2}} were successfully killed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to