Yan Xu created MESOS-6774:
-----------------------------
Summary: Role sorter and quota role sorter can have more copies of
share resources in allocations than in total.
Key: MESOS-6774
URL: https://issues.apache.org/jira/browse/MESOS-6774
Project: Mesos
Issue Type: Improvement
Components: allocation
Reporter: Yan Xu
The way shared resources support works in the allocator is to allocate multiple
copies of the shared resources so multiple frameworks can receive them.
Multiple copies of the same shared resources doesn't affect the quantity of the
sorter's allocations and total pool so it doesn't have an impact on DRF.
To make resource accounting work, though, when the copies of the same resource
are add to a framework's allocation, we increase total size of the total pool
in the sorter (again, adding these copies doesn't affect quantity) so that the
*allocations in a sorter is always bounded by the total pool in the sorter*.
This invariant is a requirement for the following logic in the allocator to
work:
{code:title=Remove the resources from the framework sorter when it's
unallocated from the framework}
frameworkSorters[role]->unallocated(
frameworkId.value(), slaveId, resources);
frameworkSorters[role]->remove(slaveId, resources);
{code}
e.g., if there are 2 copies of a shared disk allocated to framework1, the
sorter's total pool has 2 copies of the disk as well.
However we currently only do this for the framework sorter below a role because
the allocator (implicitly) assumes that role sorter, being the root-level
sorter, has a total pool that's unchanged during allocation or resource
recover. This is not a problem right now because for this reason,
{{Sorter::add(const SlaveID& slaveId, const Resources& resources)/remove(const
SlaveID& slaveId, const Resources& resources)}} are not called during
allocation or resource recover.
This will likely change with MESOS-6375, when role sorters are having a
hierarchy so not all of them are bound to the physical size of the cluster. We
should revisit the shared resource allocation logic then to make sure the
invariant *allocations in a sorter is always bounded by the total pool in the
sorter* holds.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)