[jira] [Created] (MESOS-8756) Missing reasons for early task failures

A. Dukhovniy (JIRA) Tue, 03 Apr 2018 04:36:15 -0700

A. Dukhovniy created MESOS-8756:
-----------------------------------

             Summary: Missing reasons for early task failures
                 Key: MESOS-8756
                 URL: https://issues.apache.org/jira/browse/MESOS-8756
             Project: Mesos
          Issue Type: Bug
          Components: executor, master
    Affects Versions: 1.6.0
            Reporter: A. Dukhovniy



Some early task failures are not propagated to the framework. Here is an 
example of a marathon pod (mesos containerizer) definition with a non-existing 
image:
{code:java}
{
  "id": "/fail",
  "containers": [
    {
      "name": "container-1",
      "resources": {
        "cpus": 0.1,
        "mem": 128
      },
      "image": {
        "id": "non-existing-image-56789",
        "kind": "DOCKER"
      }
    }
  ],
  "scaling": {
    "instances": 1,
    "kind": "fixed"
  },
  "networks": [
    {
      "mode": "host"
    }
  ],
  "volumes": [],
  "fetch": [],
  "scheduling": {
    "placement": {
      "constraints": []
    }
  }
}
{code}
Here the status update the framework receives is {{TASK_FAILED (Executor 
terminated)}}.

Here another example where a non-existing artifact is being fetched:
{code:java}
{
  "id": "/fail2",
  "containers": [
    {
      "name": "container-1",
      "resources": {
        "cpus": 0.1,
        "mem": 128
      },
      "image": {
        "id": "nginx",
        "kind": "DOCKER",
        "forcePull": false
      },
      "artifacts": [
        {
          "uri": "http://example.com/smth-non-existing-12345.tar.gz";
        }
      ]
    }
  ],
  "scaling": {
    "instances": 1,
    "kind": "fixed"
  },
  "networks": [
    {
      "mode": "host"
    }
  ],
  "volumes": [],
  "fetch": [],
  "scheduling": {
    "placement": {
      "constraints": []
    }
  }
}
{code}
which results in the same status update as above.

Frameworks (and their users) should always receive meaningful task failures 
reasons no matter where those failures happened. Otherwise, the only way to 
find out what happened is to grep agent logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (MESOS-8756) Missing reasons for early task failures

Reply via email to