[ 
https://issues.apache.org/jira/browse/SPARK-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Bannister updated SPARK-10293:
------------------------------------
    Description: 
Currently when running Spark on Mesos each executor will use all the CPU 
resources offered to it. This can lead to cases where a Spark executor is using 
all the CPU resources on a single slave but is underutilising the CPU allocated 
to it.

Mesos added support in 0.23 for oversubscription, where frameworks can be 
offered slack resources for CPU resources already allocated. So that if a task 
is allocated 10 cpus but is only using 1, 9 revokable offers will be made to 
other frameworks. If the original task starts using its allocated CPU then 
Mesos will preempt the revokable task, killing it.

>From a cluster usage perspective it would be very useful to be able to specify 
>that some jobs are revokable and can be ran in slack resources, and that they 
>should be rescheduled without affecting the job status (ie not count towards 
>job failure) when a task is revoked.

  was:
Currently when running Spark on Mesos each executor will use all the CPU 
resources offered to it. This can lead to cases where a Spark executor is using 
all the CPU resources on a single slave but is underutilisation the CPU 
allocated to it.

Mesos added support in 0.23 for oversubscription, where frameworks can be 
offered slack resources for CPU, so that if a task is allocated 10 cpus but is 
only using 1, 9 revokable offers will be made to other frameworks. If the 
original task starts using its allocated CPU then Mesos will preempt the 
revokable task, killing it.

>From a cluster usage perspective it would be very useful to be able to specify 
>that some jobs are revokable and can be ran in slack resources, and that they 
>should be rescheduled without affecting the job status (ie not count towards 
>job failure) when a task is revoked.


> Add support for oversubscription in Mesos
> -----------------------------------------
>
>                 Key: SPARK-10293
>                 URL: https://issues.apache.org/jira/browse/SPARK-10293
>             Project: Spark
>          Issue Type: Story
>          Components: Mesos
>            Reporter: Chris Bannister
>
> Currently when running Spark on Mesos each executor will use all the CPU 
> resources offered to it. This can lead to cases where a Spark executor is 
> using all the CPU resources on a single slave but is underutilising the CPU 
> allocated to it.
> Mesos added support in 0.23 for oversubscription, where frameworks can be 
> offered slack resources for CPU resources already allocated. So that if a 
> task is allocated 10 cpus but is only using 1, 9 revokable offers will be 
> made to other frameworks. If the original task starts using its allocated CPU 
> then Mesos will preempt the revokable task, killing it.
> From a cluster usage perspective it would be very useful to be able to 
> specify that some jobs are revokable and can be ran in slack resources, and 
> that they should be rescheduled without affecting the job status (ie not 
> count towards job failure) when a task is revoked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to