[ 
https://issues.apache.org/jira/browse/MESOS-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15522430#comment-15522430
 ] 

Guangya Liu edited comment on MESOS-5524 at 9/28/16 1:39 PM:
-------------------------------------------------------------

[~bmahler] one question want to discuss with you is when exposing the resource 
allocation constraints, do we need to expose the resources as {{role}} level or 
{{framework}} level? 

If expose as {{role}} level, then there may be problems when one role has 
multiple frameworks as each framework with same role will have same resource 
constraints, and we cannot guarantee if one framework can always get the 
exposed resources.

{{framework}} level is also not good, the problem is how we define 
{{framework}} level, just expose the resources evenly to all {{frameworks}} 
under the same {{role}} or some other ways?  expose the resources evenly to all 
{{frameworks}} under the same {{role}} is also not accurate, as there maybe a 
{{framework}} have quite a lot of tasks while others may not have tasks, and 
the framework with lot of tasks will use up all of the resources.


was (Author: gyliu):
[~bmahler] one question want to discuss with you is when exposing the resource 
allocation constraints, do we need to expose the resources as {{role}} level or 
{{framework}} level? 

If expose as {{role}} level, then there may be problems when one role has 
multiple frameworks as each framework with same role will have same resource 
constraints, and we cannot guarantee if one framework can always get the 
exposed resources.

Seems {{framework}} level is more accurate, but even with {{framework}} level, 
it may still not accurate because of the allocator coarse-grained mode for 
resource allocation when there are more frameworks than agents in cluster. any 
comments?

> Expose resource allocation constraints (quota, shares) to schedulers.
> ---------------------------------------------------------------------
>
>                 Key: MESOS-5524
>                 URL: https://issues.apache.org/jira/browse/MESOS-5524
>             Project: Mesos
>          Issue Type: Epic
>          Components: allocation, scheduler api
>            Reporter: Benjamin Mahler
>
> Currently, schedulers do not have visibility into their quota or shares of 
> the cluster. By providing this information, we give the scheduler the ability 
> to make better decisions. As we start to allow schedulers to decide how 
> they'd like to use a particular resource (e.g. as non-revocable or 
> revocable), schedulers need visibility into their quota and shares to make an 
> effective decision (otherwise they may accidentally exceed their quota and 
> will not find out until mesos replies with TASK_LOST REASON_QUOTA_EXCEEDED).
> We would start by exposing the following information:
> * quota: e.g. cpus:10, mem:20, disk:40
> * shares: e.g. cpus:20, mem:40, disk:80
> Currently, quota is used for non-revocable resources and the idea is to use 
> shares only for consuming revocable resources since the number of shares 
> available to a role changes dynamically as resources come and go, frameworks 
> come and go, or the operator manipulates the amount of resources sectioned 
> off for quota.
> By exposing quota and shares, the framework knows when it can consume 
> additional non-revocable resources (i.e. when it has fewer non-revocable 
> resources allocated to it than its quota) or when it can consume revocable 
> resources (always! but in the future, it cannot revoke another user's 
> revocable resources if the framework is above its fair share).
> This also allows schedulers to determine whether they have sufficient quota 
> assigned to them, and to alert the operator if they need more to run safely. 
> Also, by viewing their fair share, the framework can expose monitoring 
> information that shows the discrepancy between how much it would like and its 
> fair share (note that the framework can actually exceed its fair share but in 
> the future this will mean increased potential for revocation).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to