[
https://issues.apache.org/jira/browse/MESOS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295839#comment-14295839
]
Niklas Quarfot Nielsen commented on MESOS-354:
----------------------------------------------
Oversubscription means many things and can be considered as a subset of the
currently ongoing effort in optimistic offers:
Where optimistic offers lets the allocator to offer resources:
- In multiple frameworks to increase 'parallelism' (opposed to the
conservative/pessimistic scheme) and **increase task throughput**.
- Preemptable resources from unallocated but reserved resources, to **limit
reservation slack** (difference between reserverd and allocated resources).
A third (and equally important) case, which expands these scenarios is
oversubscription of _allocated_ resources which limits the **usage slack**
(difference between allocated and used resources).
There has been a lot of recent research which shows the ability to reduce usage
slack with 60% while maintaining the Service Level Objective (SLO) of latency
critical workloads(1). However, this kind of oversubscription needs policies
and fine-tuning to make sure that best-effort tasks doesn't interfere with
latency critical ones. Therefore, we'd like to start a discussion on how such a
system would look in Mesos. I will create a JIRA ticket (linking to this one)
to start the conversation.
(1)
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43017.pdf
> oversubscribe resources
> -----------------------
>
> Key: MESOS-354
> URL: https://issues.apache.org/jira/browse/MESOS-354
> Project: Mesos
> Issue Type: Story
> Components: isolation, master, slave
> Reporter: brian wickman
> Priority: Minor
> Attachments: mesos_virtual_offers.pdf
>
>
> This proposal is predicated upon offer revocation.
> The idea would be to add a new "revoked" status either by (1) piggybacking
> off an existing status update (TASK_LOST or TASK_KILLED) or (2) introducing a
> new status update TASK_REVOKED.
> In order to augment an offer with metadata about revocability, there are
> options:
> 1) Add a revocable boolean to the Offer and
> a) offer only one type of Offer per slave at a particular time
> b) offer both revocable and non-revocable resources at the same time but
> require frameworks to understand that Offers can contain overlapping resources
> 2) Add a revocable_resources field on the Offer which is a superset of the
> regular resources field. By consuming > resources <= revocable_resources in
> a launchTask, the Task becomes a revocable task. If launching a task with <
> resources, the Task is non-revocable.
> The use cases for revocable tasks are batch tasks (e.g. hadoop/pig/mapreduce)
> and non-revocable tasks are online higher-SLA tasks (e.g. services.)
> Consider a non-revocable that asks for 4 cores, 8 GB RAM and 20 GB of disk.
> One of these resources is a rate (4 cpu seconds per second) and two of them
> are fixed values (8GB and 20GB respectively, though disk resources can be
> further broken down into spindles - fixed - and iops - a rate.) In practice,
> these are the maximum resources in the respective dimensions that this task
> will use. In reality, we provision tasks at some factor below peak, and only
> hit peak resource consumption in rare circumstances or perhaps at a diurnal
> peak.
> In the meantime, we stand to gain from offering the some constant factor of
> the difference between (reserved - actual) of non-revocable tasks as
> revocable resources, depending upon our tolerance for revocable task churn.
> The main challenge is coming up with an accurate short / medium / long-term
> prediction of resource consumption based upon current behavior.
> In many cases it would be OK to be sloppy:
> * CPU / iops / network IO are rates (compressible) and can often be OK
> below guarantees for brief periods of time while task revocation takes place
> * Memory slack can be provided by enabling swap and dynamically setting
> swap paging boundaries. Should swap ever be activated, that would be a
> signal to revoke.
> The master / allocator would piggyback on the slave heartbeat mechanism to
> learn of the amount of revocable resources available at any point in time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)