James Peach created MESOS-8440:
----------------------------------

             Summary: `network/ports` isolator kills legitimate tasks on 
recovery.
                 Key: MESOS-8440
                 URL: https://issues.apache.org/jira/browse/MESOS-8440
             Project: Mesos
          Issue Type: Bug
          Components: containerization
    Affects Versions: 1.5.0
            Reporter: James Peach
            Assignee: James Peach


At recovery time, the containerizer sends all the resources *except* the ports. 
This means that the ports check will race against the subsequent resources 
update. The root cause of this is that only the executor resources are provided 
at recovery time, whereas at update time the isolator gets the whole container 
resources as calculated by {{Executor::allocatedResources()}}.

{noformat}
I0112 08:22:23.930830 28937 linux_launcher.cpp:300] Recovered container 
80a2d9dc-0492-4af5-a131-05f1cd66d672
I0112 08:22:23.931637 28933 ports.cpp:398] recovering container executor_info {
  executor_id {
    value: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
  }
  resources {
    name: "cpus"
    type: SCALAR
    scalar {
      value: 0.1
    }
    allocation_info {
      role: "*"
    }
  }
  resources {
    name: "mem"
    type: SCALAR
    scalar {
      value: 32
    }
    allocation_info {
      role: "*"
    }
  }
  command {
    value: "/home/jpeach/src/mesos/build/src/mesos-executor"
    shell: false
    arguments: "mesos-executor"
    arguments: "--launcher_dir=/home/jpeach/src/mesos/build/src"
  }
  framework_id {
    value: "4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000"
  }
  name: "Command Executor (Task: fff42f68-4aed-4ca6-a62f-71b7166bbd7a) 
(Command: sh -c \'nc -k -l 31446\')"
  source: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
}
container_id {
  value: "80a2d9dc-0492-4af5-a131-05f1cd66d672"
}
pid: 28955
directory: 
"/tmp/NetworkPortsIsolatorTest_ROOT_NC_RecoverGoodTask_eTlVKl/slaves/4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0/frameworks/4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000/executors/fff42f68-4aed-4ca6-a62f-71b7166bbd7a/runs/80a2d9dc-0492-4af5-a131-05f1cd66d672"
I0112 08:22:23.932137 28933 ports.cpp:530] Updated ports to [] for container 
80a2d9dc-0492-4af5-a131-05f1cd66d672
I0112 08:22:23.932982 28937 provisioner.cpp:493] Provisioner recovery complete
I0112 08:22:23.933924 28928 slave.cpp:6581] Sending reconnect request to 
executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework 
4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000 at executor(1)@17.228.224.108:42187
I0112 08:22:23.934587 28957 exec.cpp:282] Received reconnect request from agent 
4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
I0112 08:22:23.935724 28931 slave.cpp:4426] Received re-registration message 
from executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework 
4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000
I0112 08:22:23.936646 28967 exec.cpp:259] Executor re-registered on agent 
4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
I0112 08:22:23.936820 28929 ports.cpp:530] Updated ports to [31446-31446] for 
container 80a2d9dc-0492-4af5-a131-05f1cd66d672
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to