brian wickman created MESOS-354:
-----------------------------------

             Summary: oversubscribe resources
                 Key: MESOS-354
                 URL: https://issues.apache.org/jira/browse/MESOS-354
             Project: Mesos
          Issue Type: New Feature
          Components: isolation, master, slave
            Reporter: brian wickman
            Priority: Minor


This proposal is predicated upon offer revocation.

The idea would be to add a new "revoked" status either by (1) piggybacking off 
an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a new 
status update TASK_REVOKED.

In order to augment an offer with metadata about revocability, there are 
options:
  1) Add a revocable boolean to the Offer and
    a) offer only one type of Offer per slave at a particular time
    b) offer both revocable and non-revocable resources at the same time but 
require frameworks to understand that Offers can contain overlapping resources
  2) Add a revocable_resources field on the Offer which is a superset of the 
regular resources field.  By consuming > resources <= revocable_resources in a 
launchTask, the Task becomes a revocable task.  If launching a task with < 
resources, the Task is non-revocable.

The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce) 
and non-revocable tasks are online higher-SLA tasks (e.g. services.)

Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.  
One of these resources is a rate (4 cpu seconds per second) and two of them are 
fixed values (8GB and 20GB respectively, though disk resources can be further 
broken down into spindles - fixed - and iops - a rate.)  In practice, these are 
the maximum resources in the respective dimensions that this task will use.  In 
reality, we provision tasks at some factor below peak, and only hit peak 
resource consumption in rare circumstances or perhaps at a diurnal peak.  

In the meantime, we stand to gain from offering the some constant factor of the 
difference between (reserved - actual) of non-revocable tasks as revocable 
resources, depending upon our tolerance for revocable task churn.  The main 
challenge is coming up with an accurate short / medium / long-term prediction 
of resource consumption based upon current behavior.

In many cases it would be OK to be sloppy:
  * CPU / iops / network IO are rates (compressible) and can often be OK below 
guarantees for brief periods of time while task revocation takes place
  * Memory slack can be provided by enabling swap and dynamically setting swap 
paging boundaries.  Should swap ever be activated, that would be a signal to 
revoke.

The master / allocator would piggyback on the slave heartbeat mechanism to 
learn of the amount of revocable resources available at any point in time.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to