Hi, First thanks for the various answers and sorry for the duplicate first email, I thought it didn't went through the mailing because of the attachment, and I failed to notice the pagination when checking the archives...
I didn't know the existence of the requestResources API, indeed it looks a lot like my proposed "wishes". With that API I think I have everything in place to write a custom Allocator that should fit our needs. I also think it could be useful to provide it inside Mesos as an alternative (default ?) to the current Allocator, if there is enough interest/usage to support it. Here is the intended design I came up while reading your various contributions, feel free to comment : I understand the various concern regarding support burden when introducing specific IaaS drivers and I agree with you on that, it should be left to a plugin system with really simple methods like scaleUp(Requests) and scaleDown(emptyVolatileSlaves, nonFullVolatileSlaves). You can basically put all the important and user-specific logic (ramp up granularity, cooldown period, binpacking) in the plugin, allocator modifications would then be kept to a minimum : - after an unsuccessful round of offers, gather wishes if any and call scaleUp (requests can be empty here, which is equivalent to a blind scaleUp) - after a round of offers, list empty and non-full volatile slaves, call scaleDown. Binpacking can be handled inside the plugin by migrating tasks between the volatiles slaves before killing the remaining ones All of this can even be put under a if (plugin != null), so there is basically no impact at all when no auto-scale plugin is provided. For my information is there some WIP regarding tasks migration ? We don't really need it for binpacking since most of our workflow is stateless and I think we will not allow stateful services in volatile slaves anyway. Regards, Mathieu -----Original Message----- From: Yong Feng [mailto:fengyong...@gmail.com] Sent: Tuesday, August 04, 2015 5:59 AM To: dev@mesos.apache.org Subject: Re: Autoscaling in an IaaS environment Hi Benjamin Right, Mesos has to orchestrate shrink, for example notify framework to gracefully terminate workload, or even make the schedule decision which host will be closed and reclaimed. However it does not mean Mesos has to be built with policy to trigger the auto-scale. The policy of auto-scale Mesos cluster itself is trying to meet the overall SLA of Mesos cluster, but may not be the SLA of specific framework, even thought they may be relevant. It is probably better to ask Mesos focus on resource sharing among framework to meet the SLA of framework, while an outside Auto-scaler to monitor the Mesos and work with Mesos to meet the SLA of Mesos (all the frameworks). Thanks, Yong On Mon, Aug 3, 2015 at 2:34 PM, Benjamin Mahler <benjamin.mah...@gmail.com> wrote: > With auto-scaling, shrinking is not as easy as growing. For example, > we may need to "defragment" the cluster in order to shrink the number > of slaves, and mesos seems to be in the best position to orchestrate > such a process if you want do this based on framework's SLA > constraints (would re-use inverse offers). > > On Sun, Aug 2, 2015 at 5:35 PM, Yong Feng <fengyong...@gmail.com> wrote: > > > I prefer an auto-scaler outside mesos as well. As long as Mesos > > exports enough statistics, an outside auto-scaler should be able to > > make the auto-scale decision as smart as Mesos itself. It will also > > help to douple the resource scheduilng from resource infrastructure > > management. Mesos > just > > need focus on how to support adding/removing nodes dynanicly and > gracefully > > without impact running workload such as feature of host > > maintenance/removing .... > > > > Besides, exproting statistics also helps on Mesos > > diagnosing/troubleshooting, simulation, profiling and so on. > > > > The only case an auto-scaler may not support is that the auto-scale > decison > > may have impact on sceduling decison for exapmle resource mamanger > > (like > > Mesos) don't have to reclaim a framework if new nodes with required > > resources will be added. However we even could argue whether it is a > valid > > use case that we ask scheduling decison depends on auto-scale decsion. > > > > Thanks > > > > > > On Sun, Aug 2, 2015 at 12:56 PM, tommy xiao <xia...@gmail.com> wrote: > > > > > my want: write a daemon to query mesos framework api, get the > statistics > > > from mesos api. then invoke the IaaS's API to scale the cluster size. > > > > > > 2015-08-02 22:32 GMT+08:00 Alex Rukletsov <a...@mesosphere.com>: > > > > > > > I agree with Vinod that the Master accumulates a lot of > > > > statistics > that > > > can > > > > be used for smarter decisions about cluster scaling. However, > > > > I'm not > > > sure > > > > this feature should reside in Mesos. I would rather expose > > > > statistics > > > and / > > > > or recommendations and let external tooling or an operator do > > > > the > job. > > > > On 31 Jul 2015 7:15 pm, "Vinod Kone" <vinodk...@gmail.com> wrote: > > > > > > > > > Thanks for pinging again Mathieu! > > > > > > > > > > I think auto-scaling of a Mesos cluster is a nifty feature to have. > > The > > > > > only question in my mind (and likely others) is whether this > > > > functionality > > > > > should reside in Mesos, or a framework or an operator. As you > > > mentioned, > > > > > Netflix took the framework way but it doesn't necessarily work > > > > > in a multi-framework environment. If the functionality lies > > > > > with an > > operator > > > > it > > > > > has to be a library (likely a service) so that more people can > > > > > take advantage of it. > > > > > > > > > > In my mind, it is not hard to imagine having this > > > > > functionality in > > > Mesos. > > > > > Since Mesos is in the best position to know the (current and > perhaps > > > > > projected) state of the cluster it could make smart decisions > > > > > about > > the > > > > > shape and size of the new nodes that can be added. This also > becomes > > > > > interesting in the face of the quota > > > > > <https://issues.apache.org/jira/browse/MESOS-1791> work that > > > > > we > are > > > > > currently doing. > > > > > > > > > > Having said that, I think you can do this today by writing an > > allocator > > > > > module. Note that Mesos already provides a requestResources() > > > > > API > > call > > > > > (similar to Wish in your ppt) that is passed to the allocator. > > > > > You > > > should > > > > > be able to write an allocator module that takes this signal > > > > > and > talks > > > to > > > > > your favorite IaaS API to spin up new node(s) if necessary. > > > > > > > > > > > > > > > On Fri, Jul 31, 2015 at 8:29 AM, Roger Ignazio > > > > > <rigna...@gmail.com > > > > > > wrote: > > > > > > > > > > > With the number of IaaS providers out there, and the fact > > > > > > that > > Mesos > > > > > > doesn't really concern itself with where it's running (IaaS, > > > > bare-metal, > > > > > > on-prem, in the cloud), this sounds more like an operations > problem > > > > than > > > > > a > > > > > > feature that should be in Mesos core. > > > > > > > > > > > > By any chance, have you had a chance to look at > > > > > > https://github.com/thefactory/autoscale-python? I'd venture > > > > > > to > > guess > > > > > that > > > > > > project (or a homegrown solution talking to your IaaS' API), > > combined > > > > > with > > > > > > some custom AWS AMIs (or vSphere templates or OpenStack > > > > > > images or > > > ...), > > > > > > would satisfy your use-case. > > > > > > > > > > > > -- Roger > > > > > > > > > > > > On Fri, Jul 31, 2015 at 5:37 AM, VELTEN, MATHIEU < > > > > > mathieu.vel...@atos.net> > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I am currently working for some projects using Mesos at > > > > > > > Atos > > > Toulouse > > > > > and > > > > > > > we are using it on top of a classical IaaS. > > > > > > > > > > > > > > After playing with Mesos and looking at some code it > > > > > > > appears to > > me > > > > that > > > > > > > there is no elasticity mechanism in place. I opened an > > > > > > > issue in > > > Jira > > > > > some > > > > > > > months ago here, which contains most of the content of > > > > > > > this > > email : > > > > > > > https://issues.apache.org/jira/browse/MESOS-2453 > > > > > > > > > > > > > > Here is what I have in mind (ppt in the following link for > > > > > > > the > > > > detailed > > > > > > > and visual version ☺ ) : > > > > > > > - Add the possibility for a framework to signal that it > > > > > > > has > some > > > work > > > > > > > pending (with or without further semantics regarding what > > resources > > > > is > > > > > > > wished ?) > > > > > > > - Modify the Mesos algo to call a pluggable driver when no > > resource > > > > is > > > > > > > available and at least one framework has some work to do. > > > > > > > In this case the driver should scale up the Mesos > > > > > > > cluster by > > > > > launching > > > > > > > VMs. How much and of which size is a little tricky here > > > > > > > without > > > > adding > > > > > > > semantics to the framework signal. > > > > > > > - We should also add a flag somewhere to mark the slave as > > > "volatile" > > > > > so > > > > > > > we can prefer the use of static resources, and shut down > > > > > > > the > > > volatile > > > > > > > slaves after some time left unused. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/presentation/d/1eNQSvDQ64gPNbmf0YVPq9tIWLMCbAH > Exos5WXrm0uqI/edit?usp=sharing > > > > > > > > > > > > > > Does it look doable to you ? what do you think about the > > principle > > > ? > > > > > > > Do you think we can add some semantics to the "I have work > > > > > > > to > do" > > > > > > > framework signal without breaking the two-level scheduling > > > principle > > > > ? > > > > > > > I don't think it violates it since both mechanisms > > > > > > > (signaling a > > > need > > > > > and > > > > > > > effectively take a resource from an offer) are fully > independent > > in > > > > my > > > > > > > proposal but I feel a little out of my league to be sure. > > > > > > > > > > > > > > This proposal currently doesn't specifically address bin > packing, > > > > > however > > > > > > > with the aforementioned modifications in place it should > > > > > > > be > easy > > to > > > > add > > > > > > > since we know which resources are volatile. > > > > > > > > > > > > > > I have seen some other work (by Netflix for example) > > > > > > > address > this > > > > > problem > > > > > > > however it always seems to be at the framework level and > > > > > > > not > > inside > > > > the > > > > > > > core Mesos architecture, is there a reason for that except > > > > > > > lack > > of > > > > time > > > > > > for > > > > > > > specification/contribution ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://fr.slideshare.net/spodila/aws-reinvent-2014-talk-scheduling-usi > ng-apache-mesos-in-the-cloud > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Mathieu Velten > > > > > > > Ce message et toutes les pièces jointes (ci-après le > > > > > > > "message") > > > sont > > > > > > > établis à l’intention exclusive des destinataires > > > > > > > désignés. Il > > > > contient > > > > > > des > > > > > > > informations confidentielles et pouvant être protégé par > > > > > > > le > > secret > > > > > > > professionnel. Si vous recevez ce message par erreur, > > > > > > > merci > d'en > > > > > avertir > > > > > > > immédiatement l'expéditeur et de détruire le message. > > > > > > > Toute > > > > utilisation > > > > > > de > > > > > > > ce message non conforme à sa destination, toute diffusion > > > > > > > ou > > toute > > > > > > > publication, totale ou partielle, est interdite, sauf > > autorisation > > > > > > expresse > > > > > > > de l’émetteur. L'internet ne garantissant pas l'intégrité > > > > > > > de ce > > > > message > > > > > > > lors de son acheminement, Atos (et ses filiales) > > > > > > > décline(nt) > > toute > > > > > > > responsabilité au titre de son contenu. Bien que ce > > > > > > > message ait > > > fait > > > > > > > l’objet d’un traitement anti-virus lors de son envoi, > l’émetteur > > ne > > > > > peut > > > > > > > garantir l’absence totale de logiciels malveillants dans > > > > > > > son > > > contenu > > > > et > > > > > > ne > > > > > > > pourrait être tenu pour responsable des dommages engendrés > > > > > > > par > la > > > > > > > transmission de l’un d’eux. > > > > > > > > > > > > > > This message and any attachments (the "message") are > > > > > > > intended > > > solely > > > > > for > > > > > > > the addressee(s). It contains confidential information, > > > > > > > that > may > > be > > > > > > > privileged. If you receive this message in error, please > > > > > > > notify > > the > > > > > > sender > > > > > > > immediately and delete the message. Any use of the message > > > > > > > in > > > > violation > > > > > > of > > > > > > > its purpose, any dissemination or disclosure, either > > > > > > > wholly or > > > > > partially > > > > > > is > > > > > > > strictly prohibited, unless it has been explicitly > > > > > > > authorized > by > > > the > > > > > > > sender. As its integrity cannot be secured on the > > > > > > > internet, > Atos > > > and > > > > > its > > > > > > > subsidiaries decline any liability for the content of this > > message. > > > > > > > Although the sender endeavors to maintain a computer > > > > > > > virus-free > > > > > network, > > > > > > > the sender does not warrant that this transmission is > virus-free > > > and > > > > > will > > > > > > > not be liable for any damages resulting from any virus > > transmitted. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Deshi Xiao > > > Twitter: xds2000 > > > E-mail: xiaods(AT)gmail.com > > > > > > Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l’intention exclusive des destinataires désignés. Il contient des informations confidentielles et pouvant être protégé par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et de détruire le message. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse de l’émetteur. L'internet ne garantissant pas l'intégrité de ce message lors de son acheminement, Atos (et ses filiales) décline(nt) toute responsabilité au titre de son contenu. Bien que ce message ait fait l’objet d’un traitement anti-virus lors de son envoi, l’émetteur ne peut garantir l’absence totale de logiciels malveillants dans son contenu et ne pourrait être tenu pour responsable des dommages engendrés par la transmission de l’un d’eux. This message and any attachments (the "message") are intended solely for the addressee(s). It contains confidential information, that may be privileged. If you receive this message in error, please notify the sender immediately and delete the message. Any use of the message in violation of its purpose, any dissemination or disclosure, either wholly or partially is strictly prohibited, unless it has been explicitly authorized by the sender. As its integrity cannot be secured on the internet, Atos and its subsidiaries decline any liability for the content of this message. Although the sender endeavors to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.