Benjamin Mahler created MESOS-5524:
--------------------------------------
Summary: Expose resource consumption constraints (quota, shares)
to schedulers.
Key: MESOS-5524
URL: https://issues.apache.org/jira/browse/MESOS-5524
Project: Mesos
Issue Type: Epic
Components: scheduler api, allocation
Reporter: Benjamin Mahler
Currently, schedulers do not have visibility into their quota or shares of the
cluster. By providing this information, we give the scheduler the ability to
make better decisions. As we start to allow schedulers to decide how they'd
like to use a particular resource (e.g. as non-revocable or revocable),
schedulers need visibility into their quota and shares to make an effective
decision (otherwise they may accidentally exceed their quota and will not find
out until mesos replies with TASK_LOST REASON_QUOTA_EXCEEDED).
We would start by exposing the following information:
* quota: e.g. cpus:10, mem:20, disk:40
* shares: e.g. cpus:20, mem:40, disk:80
Currently, quota is used for non-revocable resources and the idea is to use
shares only for consuming revocable resources since the number of shares
available to a role changes dynamically as resources come and go, frameworks
come and go, or the operator manipulates the amount of resources sectioned off
for quota.
By exposing quota and shares, the framework knows when it can consume
additional non-revocable resources (i.e. when it has fewer non-revocable
resources allocated to it than its quota) or when it can consume revocable
resources (always! but in the future, it cannot revoke another user's revocable
resources if the framework is above its fair share).
This also allows schedulers to determine whether they have sufficient quota
assigned to them, and to alert the operator if they need more to run safely.
Also, by viewing their fair share, the framework can expose monitoring
information that shows the discrepancy between how much it would like and its
fair share (note that the framework can actually exceed its fair share but in
the future this will mean increased potential for revocation).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)