Chris Bannister created SPARK-10293:
---------------------------------------
Summary: Add support for oversubscription in Mesos
Key: SPARK-10293
URL: https://issues.apache.org/jira/browse/SPARK-10293
Project: Spark
Issue Type: Story
Components: Mesos
Reporter: Chris Bannister
Currently when running Spark on Mesos each executor will use all the CPU
resources offered to it. This can lead to cases where a Spark executor is using
all the CPU resources on a single slave but is underutilisation the CPU
allocated to it.
Mesos added support in 0.23 for oversubscription, where frameworks can be
offered slack resources for CPU, so that if a task is allocated 10 cpus but is
only using 1, 9 revokable offers will be made to other frameworks. If the
original task starts using its allocated CPU then Mesos will preempt the
revokable task, killing it.
>From a cluster usage perspective it would be very useful to be able to specify
>that some jobs are revokable and can be ran in slack resources, and that they
>should be rescheduled without affecting the job status (ie not count towards
>job failure) when a task is revoked.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]