[
https://issues.apache.org/jira/browse/SPARK-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038987#comment-14038987
]
Andrew Ash commented on SPARK-1882:
-----------------------------------
Yeah, for homogeneous environments, I think you can get full utilization of
both CPU and memory across the cluster. It could work like this:
Suppose each machine has 16 cores and 256GB memory, which is 16GB per core.
Leave {{spark.task.cpus}} at the default of 1 and set {{spark.executor.memory}}
to 16GB. Now each task launched grabs one core and 16GB. Once they're all
taken on a machine, it's fully maxed out in both memory and CPU.
But if we have heterogeneous machines with different CPU:memory ratios, I think
we'd be in trouble. We couldn't pick a ratio that fully utilizes all machines,
so we'd have either under-utilized CPUs or under-utilized memory for machines
with low CPU:memory vs high CPU:memory ratios, respectively.
The suggestion of un-coupling cores and memory is a good one -- if each task
accepted an amount of memory proportional to the remaining memory on the
machine, then I think you could get good utilization even across heterogeneous
environments
> Support dynamic memory sharing in Mesos
> ---------------------------------------
>
> Key: SPARK-1882
> URL: https://issues.apache.org/jira/browse/SPARK-1882
> Project: Spark
> Issue Type: Improvement
> Components: Mesos
> Affects Versions: 1.0.0
> Reporter: Andrew Ash
>
> Fine grained mode Mesos currently supports sharing CPUs very well, but
> requires that memory be pre-partitioned according to the executor memory
> parameter. Mesos supports dynamic memory allocation in addition to dynamic
> CPU allocation, so we should utilize this feature in Spark.
> See below where when the Mesos backend accepts a resource offer it only
> checks that there's enough memory to cover sc.executorMemory, and doesn't
> ever take a fraction of the memory available. The memory offer is accepted
> all or nothing from a pre-defined parameter.
> Coarse mode:
> https://github.com/apache/spark/blob/3ce526b168050c572a1feee8e0121e1426f7d9ee/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala#L208
> Fine mode:
> https://github.com/apache/spark/blob/a5150d199ca97ab2992bc2bb221a3ebf3d3450ba/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L114
--
This message was sent by Atlassian JIRA
(v6.2#6252)