Thanks for pinging again Mathieu! I think auto-scaling of a Mesos cluster is a nifty feature to have. The only question in my mind (and likely others) is whether this functionality should reside in Mesos, or a framework or an operator. As you mentioned, Netflix took the framework way but it doesn't necessarily work in a multi-framework environment. If the functionality lies with an operator it has to be a library (likely a service) so that more people can take advantage of it.
In my mind, it is not hard to imagine having this functionality in Mesos. Since Mesos is in the best position to know the (current and perhaps projected) state of the cluster it could make smart decisions about the shape and size of the new nodes that can be added. This also becomes interesting in the face of the quota <https://issues.apache.org/jira/browse/MESOS-1791> work that we are currently doing. Having said that, I think you can do this today by writing an allocator module. Note that Mesos already provides a requestResources() API call (similar to Wish in your ppt) that is passed to the allocator. You should be able to write an allocator module that takes this signal and talks to your favorite IaaS API to spin up new node(s) if necessary. On Fri, Jul 31, 2015 at 8:29 AM, Roger Ignazio <rigna...@gmail.com> wrote: > With the number of IaaS providers out there, and the fact that Mesos > doesn't really concern itself with where it's running (IaaS, bare-metal, > on-prem, in the cloud), this sounds more like an operations problem than a > feature that should be in Mesos core. > > By any chance, have you had a chance to look at > https://github.com/thefactory/autoscale-python? I'd venture to guess that > project (or a homegrown solution talking to your IaaS' API), combined with > some custom AWS AMIs (or vSphere templates or OpenStack images or ...), > would satisfy your use-case. > > -- Roger > > On Fri, Jul 31, 2015 at 5:37 AM, VELTEN, MATHIEU <mathieu.vel...@atos.net> > wrote: > > > Hi, > > > > I am currently working for some projects using Mesos at Atos Toulouse and > > we are using it on top of a classical IaaS. > > > > After playing with Mesos and looking at some code it appears to me that > > there is no elasticity mechanism in place. I opened an issue in Jira some > > months ago here, which contains most of the content of this email : > > https://issues.apache.org/jira/browse/MESOS-2453 > > > > Here is what I have in mind (ppt in the following link for the detailed > > and visual version ☺ ) : > > - Add the possibility for a framework to signal that it has some work > > pending (with or without further semantics regarding what resources is > > wished ?) > > - Modify the Mesos algo to call a pluggable driver when no resource is > > available and at least one framework has some work to do. > > In this case the driver should scale up the Mesos cluster by launching > > VMs. How much and of which size is a little tricky here without adding > > semantics to the framework signal. > > - We should also add a flag somewhere to mark the slave as "volatile" so > > we can prefer the use of static resources, and shut down the volatile > > slaves after some time left unused. > > > > > > > https://docs.google.com/presentation/d/1eNQSvDQ64gPNbmf0YVPq9tIWLMCbAHExos5WXrm0uqI/edit?usp=sharing > > > > Does it look doable to you ? what do you think about the principle ? > > Do you think we can add some semantics to the "I have work to do" > > framework signal without breaking the two-level scheduling principle ? > > I don't think it violates it since both mechanisms (signaling a need and > > effectively take a resource from an offer) are fully independent in my > > proposal but I feel a little out of my league to be sure. > > > > This proposal currently doesn't specifically address bin packing, however > > with the aforementioned modifications in place it should be easy to add > > since we know which resources are volatile. > > > > I have seen some other work (by Netflix for example) address this problem > > however it always seems to be at the framework level and not inside the > > core Mesos architecture, is there a reason for that except lack of time > for > > specification/contribution ? > > > > > http://fr.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-using-apache-mesos-in-the-cloud > > > > Regards, > > > > Mathieu Velten > > Ce message et toutes les pièces jointes (ci-après le "message") sont > > établis à l’intention exclusive des destinataires désignés. Il contient > des > > informations confidentielles et pouvant être protégé par le secret > > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > > immédiatement l'expéditeur et de détruire le message. Toute utilisation > de > > ce message non conforme à sa destination, toute diffusion ou toute > > publication, totale ou partielle, est interdite, sauf autorisation > expresse > > de l’émetteur. L'internet ne garantissant pas l'intégrité de ce message > > lors de son acheminement, Atos (et ses filiales) décline(nt) toute > > responsabilité au titre de son contenu. Bien que ce message ait fait > > l’objet d’un traitement anti-virus lors de son envoi, l’émetteur ne peut > > garantir l’absence totale de logiciels malveillants dans son contenu et > ne > > pourrait être tenu pour responsable des dommages engendrés par la > > transmission de l’un d’eux. > > > > This message and any attachments (the "message") are intended solely for > > the addressee(s). It contains confidential information, that may be > > privileged. If you receive this message in error, please notify the > sender > > immediately and delete the message. Any use of the message in violation > of > > its purpose, any dissemination or disclosure, either wholly or partially > is > > strictly prohibited, unless it has been explicitly authorized by the > > sender. As its integrity cannot be secured on the internet, Atos and its > > subsidiaries decline any liability for the content of this message. > > Although the sender endeavors to maintain a computer virus-free network, > > the sender does not warrant that this transmission is virus-free and will > > not be liable for any damages resulting from any virus transmitted. > > >