[
https://issues.apache.org/jira/browse/MESOS-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Klaus Ma updated MESOS-4442:
----------------------------
Description:
Here's the time sequence of this issue:
T1: in cluster, {{cpus=2}}: one is revocable and the other one is nonRevocable
T2: framework1 get offer {{cpus=2}}, launch task but estimator report empty
resources before {{RunTaskMessage}} arrived at slave/agent
T3: {{slave.total}} is updated to cpus=1 in
{{HierarchicalAllocatorProcess::updateSlave}}
T4: in {{allocate()}}, slave.total (cpus=1) < slave.allocated (cpus=2)
At T4, the state of allocator is in-correct (slave.total < slave.allocated).
Here's the log based on my test:
{code}
I0121 16:12:13.023401 1064960 hierarchical.cpp:528] Slave
aa092710-e73e-4f1c-a5ff-4a8542944d41-S0 (9.181.90.153) updated with
oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024;
ports(*):[31000-32000]; cpus(*){REV}:2)
{code}
was:
Here's the time sequence of this issue:
T1: in cluster, {{cpus=2}}: one is revocable and the other one is nonRevocable
T2: framework1 get offer {{cpus=2}}, but did NOT launch tasks
T3: Estimator update empty resources; slave.total is updated to cpus=1 in
{{HierarchicalAllocatorProcess::updateSlave}}
T4: in {{allocate()}}, slave.total (cpus=1) < slave.allocated (cpus=2), the
resources cpus=1 will re-offer to framework because {{operator-}} will return
first item if {{subtractable()}} is false.
At T4, the state of allocator is in-correct (slave.total > slave.allocated).
> `allocated` may have more resources then `total` in allocator
> -------------------------------------------------------------
>
> Key: MESOS-4442
> URL: https://issues.apache.org/jira/browse/MESOS-4442
> Project: Mesos
> Issue Type: Bug
> Components: master
> Reporter: Klaus Ma
> Assignee: Klaus Ma
>
> Here's the time sequence of this issue:
> T1: in cluster, {{cpus=2}}: one is revocable and the other one is nonRevocable
> T2: framework1 get offer {{cpus=2}}, launch task but estimator report empty
> resources before {{RunTaskMessage}} arrived at slave/agent
> T3: {{slave.total}} is updated to cpus=1 in
> {{HierarchicalAllocatorProcess::updateSlave}}
> T4: in {{allocate()}}, slave.total (cpus=1) < slave.allocated (cpus=2)
> At T4, the state of allocator is in-correct (slave.total < slave.allocated).
> Here's the log based on my test:
> {code}
> I0121 16:12:13.023401 1064960 hierarchical.cpp:528] Slave
> aa092710-e73e-4f1c-a5ff-4a8542944d41-S0 (9.181.90.153) updated with
> oversubscribed resources (total: cpus(*):2; mem(*):1024; disk(*):1024;
> ports(*):[31000-32000], allocated: cpus(*):2; mem(*):1024; disk(*):1024;
> ports(*):[31000-32000]; cpus(*){REV}:2)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)