[
https://issues.apache.org/jira/browse/SPARK-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003893#comment-14003893
]
Andy Konwinski commented on SPARK-1882:
---------------------------------------
It seems like the problem is with heterogeneous environments (machines with
different memory/cpu ratios).
One idea is to change from using a single value that is required/used by each
Spark executor to using a bit of conditional logic (e.g. if accepting a partial
slot would leave the machine with less than XGB men just accept all memory in
the offer, else, accept default_slot_mem_size) so that you could have a range
of values that would work, this could help to reduce fragmentation.
Also, I'm not sure if Mesos will tell you in a resource offer how much total
memory the machine contains (in addition to how much is currently being offered
from that machine), but I'm pretty sure you can get access to that value from
Mesos some how. You could also use that value somehow when deciding to accept
resources (to lower chance of fragmentation).
> Support dynamic memory sharing in Mesos
> ---------------------------------------
>
> Key: SPARK-1882
> URL: https://issues.apache.org/jira/browse/SPARK-1882
> Project: Spark
> Issue Type: Improvement
> Components: Mesos
> Affects Versions: 1.0.0
> Reporter: Andrew Ash
>
> Fine grained mode Mesos currently supports sharing CPUs very well, but
> requires that memory be pre-partitioned according to the executor memory
> parameter. Mesos supports dynamic memory allocation in addition to dynamic
> CPU allocation, so we should utilize this feature in Spark.
> See below where when the Mesos backend accepts a resource offer it only
> checks that there's enough memory to cover sc.executorMemory, and doesn't
> ever take a fraction of the memory available. The memory offer is accepted
> all or nothing from a pre-defined parameter.
> Coarse mode:
> https://github.com/apache/spark/blob/3ce526b168050c572a1feee8e0121e1426f7d9ee/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala#L208
> Fine mode:
> https://github.com/apache/spark/blob/a5150d199ca97ab2992bc2bb221a3ebf3d3450ba/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L114
--
This message was sent by Atlassian JIRA
(v6.2#6252)