[
https://issues.apache.org/jira/browse/MESOS-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054700#comment-16054700
]
Yan Xu edited comment on MESOS-7639 at 6/19/17 8:42 PM:
--------------------------------------------------------
I think you are right that you stopped seeing the crash because
{quote}it now updates the frameworkSorter by offeredResources rather than
frameworkAllocation{quote}
In your test the following is happening
1. {{updateSlave}} changes {{cpus\(\*\)\{REV\}:10}} to {{cpus\(\*)\{REV\}:8}}
in totals.
2. {{RESERVE}} is changes {{cpus\(\*\)\{REV\}:10}} to
{{cpus(default-role)\{REV\}:10}} in allocations.
When you operate on the full framework allocation, the reserve would include
the revocable resources which would fail the CHECK while when you operate on
the offered resources the check wouldn't include the revocable resources.
However I don't think the situation is corrected by {{Master::_accept}} because
it doesn't update totals. Eventually the total will be corrected by
[this|https://github.com/apache/mesos/blob/e00cceda4c31b71017cc9db860e3cf038bbf1d77/src/master/allocator/mesos/hierarchical.cpp#L668]
but in the meantime I think if a task is launched with the over-allocated
revocable cpus it's going to cause troubles on the agent (it doesn't look like
this is caught by the current master validation). Perhaps you can change your
test to verify task launch instead of reservation?
We can probably change the master validation to catch this but in general I
feel we should make sure that "all allocator operations should atomically
maintain the consistency of its internal state", relying on a followup
operation to attempt to fix the inconsistent state is problematic and hard to
troubleshoot when it doesn't crash but rather messes with the allocator math in
a subtle way. However this is a design limitation of the current allocator API
and harder to fix.
was (Author: xujyan):
I think you are right that you stopped seeing the crash because
{quote}it now updates the frameworkSorter by offeredResources rather than
frameworkAllocation{quote}
In your test the following is happening
1. {{updateSlave}} changes {{cpus\(*)\{REV\}:10}} to {{cpus\(*)\{REV\}:8}} in
totals.
2. {{RESERVE}} is changes {{cpus\(*)\{REV\}:10}} to
{{cpus(default-role)\{REV\}:10}} in allocations.
When you operate on the full framework allocation, the reserve would include
the revocable resources which would fail the CHECK while when you operate on
the offered resources the check wouldn't include the revocable resources.
However I don't think the situation is corrected by {{Master::_accept}} because
it doesn't update totals. Eventually the total will be corrected by
[this|https://github.com/apache/mesos/blob/e00cceda4c31b71017cc9db860e3cf038bbf1d77/src/master/allocator/mesos/hierarchical.cpp#L668]
but in the meantime I think if a task is launched with the over-allocated
revocable cpus it's going to cause troubles on the agent (it doesn't look like
this is caught by the current master validation). Perhaps you can change your
test to verify task launch instead of reservation?
We can probably change the master validation to catch this but in general I
feel we should make sure that "all allocator operations should atomically
maintain the consistency of its internal state", relying on a followup
operation to attempt to fix the inconsistent state is problematic and hard to
troubleshoot when it doesn't crash but rather messes with the allocator math in
a subtle way. However this is a design limitation of the current allocator API
and harder to fix.
> Oversubscription could crash the master due to CHECK failure in the allocator
> -----------------------------------------------------------------------------
>
> Key: MESOS-7639
> URL: https://issues.apache.org/jira/browse/MESOS-7639
> Project: Mesos
> Issue Type: Bug
> Reporter: Yan Xu
>
> As I described in MESOS-7566, the following scenario is possible when the
> agent sends updated oversubscribed resources to the master:
> - The agent's {{UpdateSlaveMessage}} reduces the the oversubscribed resources.
> - {{Master::updateSlave}} upon receiving the update would first call
> {{HierarchicalAllocatorProcess::updateSlave}}, followed by
> {{allocator->recoverResources}}.
> - {{HierarchicalAllocatorProcess::updateSlave}} would update
> {{roleSorter.total_}} to reduce to total so the total could go below the
> allocation.
> - In the subsequent {{allocator->recoverResources}} call the attempt to
> remove outstanding allocation may fail to reduce it to below the total
> because some allocation may not be in outstanding offers. It could be in
> offered resources pending between {{Master::accept}} and {{Master::_accept}}.
> So the end result could still be {{total < allocation}}.
> - Then when {{Master::_accept}} is executed, it will then call
> {{allocator->updateAllocation}}, in which the {{total < allocation}}
> condition could trigger such crash.
> The gist is that there are resources that are neither in master's {{offers}}
> or tracked in the allocator when {{Master::updateSlave}} is called.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)