[jira] [Commented] (MESOS-3548) Investigate federations of Mesos masters

Elouan Keryell-Even (JIRA) Mon, 30 Nov 2015 07:23:35 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031948#comment-15031948
 ]


Elouan Keryell-Even commented on MESOS-3548:
--------------------------------------------

My team is also interested in multi-cluster management with Mesos.

For now we have set up a test architecture consisting of 2 separated clusters, 
with one mesos master managing both of them.

The use case we are interested in is to have multiple clusters collaborating, 
each one being able to borrow a few slaves from another, when facing an load 
peak (this is indeed "bursting"). I think that would imply that each cluster is 
managed by one Mesos master, and that the various masters could communicate in 
some way or another for the resource lending/borrowing.

Elouan KERYELL-EVEN
Software engineer @ Atos Integration
Toulouse, France

> Investigate federations of Mesos masters
> ----------------------------------------
>
>                 Key: MESOS-3548
>                 URL: https://issues.apache.org/jira/browse/MESOS-3548
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Neil Conway
>              Labels: federation, mesosphere, multi-dc
>
> In a large Mesos installation, the operator might want to ensure that even if 
> the Mesos masters are inaccessible or failed, new tasks can still be 
> scheduled (across multiple different frameworks). HA masters are only a 
> partial solution here: the masters might still be inaccessible due to a 
> correlated failure (e.g., Zookeeper misconfiguration/human error).
> To support this, we could support the notion of "hierarchies" or 
> "federations" of Mesos masters. In a Mesos installation with 10k machines, 
> the operator might configure 10 Mesos masters (each of which might be HA) to 
> manage 1k machines each. Then an additional "meta-Master" would manage the 
> allocation of cluster resources to the 10 masters. Hence, the failure of any 
> individual master would impact 1k machines at most. The meta-master might not 
> have a lot of work to do: e.g., it might be limited to occasionally 
> reallocating cluster resources among the 10 masters, or ensuring that newly 
> added cluster resources are allocated among the masters as appropriate. 
> Hence, the failure of the meta-master would not prevent any of the individual 
> masters from scheduling new tasks. A single framework instance probably 
> wouldn't be able to use more resources than have been assigned to a single 
> Master, but that seems like a reasonable restriction.
> This feature might also be a good fit for a multi-datacenter deployment of 
> Mesos: each Mesos master instance would manage a single DC. Naturally, 
> reducing the traffic between frameworks and the meta-master would be 
> important for performance reasons in a configuration like this.
> Operationally, this might be simpler if Mesos processes were self-hosting 
> ([MESOS-3547]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-3548) Investigate federations of Mesos masters

Reply via email to