James Peach created MESOS-8440:
----------------------------------
Summary: `network/ports` isolator kills legitimate tasks on
recovery.
Key: MESOS-8440
URL: https://issues.apache.org/jira/browse/MESOS-8440
Project: Mesos
Issue Type: Bug
Components: containerization
Affects Versions: 1.5.0
Reporter: James Peach
Assignee: James Peach
At recovery time, the containerizer sends all the resources *except* the ports.
This means that the ports check will race against the subsequent resources
update. The root cause of this is that only the executor resources are provided
at recovery time, whereas at update time the isolator gets the whole container
resources as calculated by {{Executor::allocatedResources()}}.
{noformat}
I0112 08:22:23.930830 28937 linux_launcher.cpp:300] Recovered container
80a2d9dc-0492-4af5-a131-05f1cd66d672
I0112 08:22:23.931637 28933 ports.cpp:398] recovering container executor_info {
executor_id {
value: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
}
resources {
name: "cpus"
type: SCALAR
scalar {
value: 0.1
}
allocation_info {
role: "*"
}
}
resources {
name: "mem"
type: SCALAR
scalar {
value: 32
}
allocation_info {
role: "*"
}
}
command {
value: "/home/jpeach/src/mesos/build/src/mesos-executor"
shell: false
arguments: "mesos-executor"
arguments: "--launcher_dir=/home/jpeach/src/mesos/build/src"
}
framework_id {
value: "4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000"
}
name: "Command Executor (Task: fff42f68-4aed-4ca6-a62f-71b7166bbd7a)
(Command: sh -c \'nc -k -l 31446\')"
source: "fff42f68-4aed-4ca6-a62f-71b7166bbd7a"
}
container_id {
value: "80a2d9dc-0492-4af5-a131-05f1cd66d672"
}
pid: 28955
directory:
"/tmp/NetworkPortsIsolatorTest_ROOT_NC_RecoverGoodTask_eTlVKl/slaves/4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0/frameworks/4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000/executors/fff42f68-4aed-4ca6-a62f-71b7166bbd7a/runs/80a2d9dc-0492-4af5-a131-05f1cd66d672"
I0112 08:22:23.932137 28933 ports.cpp:530] Updated ports to [] for container
80a2d9dc-0492-4af5-a131-05f1cd66d672
I0112 08:22:23.932982 28937 provisioner.cpp:493] Provisioner recovery complete
I0112 08:22:23.933924 28928 slave.cpp:6581] Sending reconnect request to
executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework
4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000 at executor(1)@17.228.224.108:42187
I0112 08:22:23.934587 28957 exec.cpp:282] Received reconnect request from agent
4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
I0112 08:22:23.935724 28931 slave.cpp:4426] Received re-registration message
from executor 'fff42f68-4aed-4ca6-a62f-71b7166bbd7a' of framework
4ad59c30-7b1e-4991-bda2-e7f9275d3693-0000
I0112 08:22:23.936646 28967 exec.cpp:259] Executor re-registered on agent
4ad59c30-7b1e-4991-bda2-e7f9275d3693-S0
I0112 08:22:23.936820 28929 ports.cpp:530] Updated ports to [31446-31446] for
container 80a2d9dc-0492-4af5-a131-05f1cd66d672
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)