Hi Michael, Thanks for the detailed reply. Please see the comments inline.
On Thu, Feb 26, 2015 at 8:52 PM, Michael Hall (michaha2) <[email protected] > wrote: > Hi Lahiru, > > I’ve read up a bit of material on metric prediction/tracking and yes, I > think in terms of metric prediction, curve fitting seems like an > established technique, > http://thateye.org/Load_Prediction_and_Hot_Spot_Detection_Models_for_Autonomic_Cloud_Computing.pdf, > although > cubic spline a.k.a 3rd degree polynomial filter is recommended here. > > In terms of metric tracking/filtering, were you thinking of using moving > ave alone? The afore mentioned paper mentions EMA (Exponential Moving > Average). It seems to have better performance and be computationally > efficient. > > Agree. I think we got the same suggestion in the thread earlier mentioned. EMA is a good candidate for this. > I think separating the problem into tracking and prediction is important > as architecturally speaking (as well as functionally speaking), > tracking/filtering and prediction are two different beasts. > > 1) Tracking/Filtering should be owned, or at least ’tuned' by it's > metric type (cpu load, memory consumption) and adjusted to suit its’ > metrics characteristics accordingly (An example would be that each metric > type defines it's own EMA smoothing configuration). You could even envision > that it’s really the metric’s responsibility to send already > ’tracked/filtered’ values from source, or at least define it’s own filter() > method to be applied to it’s raw measurements. > > 2) Prediction can be done on any subsequent ‘tracked/filtered’ data set > and is definitely a part of the autoscale function, we only need one > predict(dataset) method. > > Yes. I was thinking the same. So the architectural wise, we already have that differentiation in Stratos model. 1) We have CEP, which is collecting data, analyse, and give our function parameter. 2) Autoscaler side, we use that function parameters and do the prediction as per the Autoscaling requirement( can predict for different time duration etc.) I think we can maintain same architecture and introduce new features. > It’s certainly an interesting topic, I wonder if eventually we’d try to > model these algorithms for fast prototyping/comparison? > > Thanks for wanting my thoughts on the matter, I did actually spend some > time thinking about this topic a few weeks ago, and have some slides on > other thoughts as well. Please let me know if you’d be interested in > looking through those. > Definitely! I was doing some research, but could not find much things as industry has not set any standards yet. Could not invest time on this recently as i had bit tight schedule. Happy to see your stuff. Thanks. > > Mike > > > From: Lahiru Sandaruwan <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Thursday, 26 February 2015 10:13 > To: dev <[email protected]> > Cc: Lakmal Warusawithana <[email protected]>, Imesh Gunaratne < > [email protected]> > Subject: Re: Autoscale proposal - transition compensated continuous > scaling > > Hi Michael, > > Interesting analysis. > > We have had some discussions regarding the prediction/ regression > improvement in dev list. > > It was based around introducing curve fitting, rather than using > separate "ave + ave grad + ave 2nd grad", what we have now. > > The discussion was under the subject "[Autoscaling] [Improvement] > Introducing "curve fitting" for stat prediction algorithm of Autoscaler". > > Could you have a look at that discussion as well. Also i have filled the > same suggestion in a Jira [1]. > > Thanks. > > [1] https://issues.apache.org/jira/browse/STRATOS-1211 > > On Fri, Feb 13, 2015 at 9:37 PM, Michael Hall (michaha2) < > [email protected]> wrote: > >> Hi Imesh, Lakmal, Devs, >> >> Following the attached email thread, this email is intended as a >> starting point in formalising the proposal of the proceeding autoscale >> enhancement. >> >> Kind regards, >> >> Mike… >> >> Proposal for ‘*Transition Compensated Continuous Scaling*’ enhancement >> to be added to Apache Stratos Autoscale feature to: >> >> 1. Greatly improve (~ x100) the maximum rate of cluster size increase >> (maximum rate of ascent), when subjected to a sudden increase in load. >> (continuous scaling decisions can occur as the decision isn’t delayed >> (cluster monitor interval) to wait for the system to tend toward a steady >> state) >> 2. Eliminate redundant cartridges being spawned/terminated because >> of cartridge startup/stop being larger than a scaling decision interval >> (cluster monitor interval) >> >> *Implementation Overview:* >> >> Current: >> >> Measured health statistic -> sent to CEP -> 1 minute average -> forward >> prediction (use ave + ave grad + ave 2nd grad) -> use autoscale policy to >> calc number of required cartridges -> compare required cartridge count, to >> current cartridge found and scale appropriately >> >> Proposed: >> >> Measured health statistic -> sent to CEP -> 1 minute‘moving’average (per >> second) -> forward prediction (use ave + ave grad + ave 2nd grad) -> use >> autoscale policy to calc number of required cartridges -> compare required >> cartridge count, to ( the active (current) cartridge count + the >> spawning cartridge count – the terminating cartridge count ) and scale >> appropriately >> >> *Implementation of ‘spawning’/‘terminating’ cartridge count:* >> >> Currently the autoscale feature is not aware of the amount of >> cartridges in the cluster that are transitioning to and from the ACTIVE >> state. The proposed enhancement relies on being able to know this count at >> any given moment in time. >> >> This can be implemented by using asynchronous events, where: >> >> ‘MEMBER SPAWNED EVENT’ -> increments cluster-cartridge-count-spawned >> ‘MEMBER ACTIVE EVENT’ -> decrements cluster-cartridge-count-spawned, and >> increments cluster-cartridge-count-active >> ‘MEMBER TERMINATING EVENT’ -> increments cluster-cartridge-count- >> terminating >> ‘MEMBER TERMINATED EVENT’ -> decrements cluster-cartridge-count-terminating, >> and decrements cluster-cartridge-count-active >> >> *Summary* >> >> By compensating the ‘current’ cartridge count/ cluster size, with the >> cartridges that are transitioning, we remove the issue of duplicating >> scaling decisions whilst also allowing the scaling decision to occur >> continuously, greatly improving our ‘maximum rate of ascent’ when scaling >> up our cluster in reaction to a sudden increase in load. >> >> >> >> From: Michael Hall <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Friday, 13 February 2015 11:36 >> To: Lakmal Warusawithana <[email protected]>, "[email protected]" < >> [email protected]>, Imesh Gunaratne <[email protected]> >> Subject: Re: autoscale architecture >> >> That’s a good plan, >> >> My work number is +442088242650 >> >> I’m around now, but will break for a while for lunch in an hour or so. >> >> Cheers >> >> From: Lakmal Warusawithana <[email protected]> >> Date: Friday, 13 February 2015 11:24 >> To: "[email protected]" <[email protected]>, Imesh Gunaratne < >> [email protected]>, Michael Hall <[email protected]> >> Subject: Re: autoscale architecture >> >> Shall we go for a call, it will be more productive. >> >> On Fri, Feb 13, 2015 at 4:45 PM, Lakmal Warusawithana <[email protected]> >> wrote: >> >>> Hi Michael >>> >>> On Fri, Feb 13, 2015 at 4:14 PM, Michael Hall (michaha2) < >>> [email protected]> wrote: >>> >>>> Hi Imesh, >>>> >>>> So ‘transistion compensated’ refers to cartridges, which are >>>> ’transistioning’ between SPAWNED-ACTIVE, and TERMINATING-TERMINATED. >>>> >>>> What it really means, is that if the 'aggregated average’ (Referred >>>> to this as <metric>PredictedValue in scaling.drl) is compensated: >>>> >>>> 1. As if the ‘spawning’ cartridges are providing resouce (although >>>> they aren’t yet) >>>> 2. As if the ‘terminating’ cartridges have removed resource >>>> (although they haven't yet) >>>> >>>> Such that the ‘transition compensated aggregated average', will be >>>> approximately what the actually aggregated average would be if those >>>> cartridges had become fully ‘active’ or ‘terminated’. This means the >>>> ‘transition compensated aggregated average’ is always in a sensible state >>>> to make a scaling decision. >>>> >>>> This then allows us to make a scaling decision as often as we’d like >>>> (much smaller than 90 seconds, could even be every 1 second), because if >>>> you take the example the we’ve scaled up, the 'transition compensated >>>> aggregated average’ will instantly adjust to N/N+1 of it’s raw value >>>> (copied formula from previous email for reference below), so another >>>> scaling decision will only occur, if the underlying load (aggregated >>>> average) increases even further. >>>> >>>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size / >>>> cluster-size + cluster-spawned-size - cluster–terminating-size )* >>>> >>>> >>> I think this is good proposal, definitely it will help to calculate >>> more accurate agg-ave values. Since CEP has the topology information we can >>> easily calculate this. >>> >>> AFAIK, auto scaler take care of cartridge states when calculating >>> required instances count for a predicted load. >>> >>> >>> >>>> I’d be more than happy to setup a webex meeting to try and explain >>>> this better? Or another avenue of communication at your preference? >>>> >>>> Kind regards, >>>> >>>> Mike >>>> >>>> From: Imesh Gunaratne <[email protected]> >>>> Reply-To: "[email protected]" <[email protected]> >>>> Date: Friday, 13 February 2015 01:09 >>>> >>>> To: dev <[email protected]> >>>> Subject: Re: autoscale architecture >>>> >>>> Hi Mike, >>>> >>>> Thanks for the detailed explanation of your question. Currently we do >>>> not have the capability to do this in runtime for a specific cartridge. >>>> However we could reduce the global scaling decision interval. This needs to >>>> be configured at three locations: >>>> >>>> 1. Cartridge agent statistics publishing interval (default: 15 >>>> seconds) >>>> 2. CEP execution plan/faulty member detection interval (default: 1 min) >>>> 3. Autoscaler cluster monitor interval (default: 90 seconds) >>>> >>>> I did not clearly get what you mean by 'transition compensated'. Is >>>> there a way to explain it further? >>>> >>>> Thanks >>>> >>>> >>>> On Fri, Feb 13, 2015 at 12:26 AM, Michael Hall (michaha2) < >>>> [email protected]> wrote: >>>> >>>>> Hi Dev, >>>>> >>>>> Thanks for your response Imesh, if its ok, I’d like to skip straight >>>>> to my (rather lengthy) question: >>>>> >>>>> Does the autoscaler have, currently or plans to introduce, a means >>>>> to receive an asynchronous event, signalling that a cartridge has gone >>>>> from >>>>> ‘SPAWNED’ to ‘ACTIVE’, after it is launched from a 'scale-up’ decision, so >>>>> that, scaling decision interval can decrease to approximately the metric >>>>> update interval, and multiple cartridges are not spawned when only one is >>>>> needed? >>>>> >>>>> In more depth: >>>>> >>>>> The reasons for my question being that by knowing a cartridge is in >>>>> the ‘SPAWNED’ or ’TERMINATING’ state, the aggregated metric averages can >>>>> be >>>>> ’transition compensated’ I.e… >>>>> *transistion-compensated-agg-ave = agg-ave * ( cluster-size / >>>>> cluster-size + cluster-spawned-size - cluster–terminating-size )* >>>>> To allow the scaling decisions to occur on a continuous (only >>>>> throttled by the metric update frequency) basis. >>>>> >>>>> It appears that currently scaling decision occurs ~minutes. If this >>>>> becomes ~seconds, it would vastly improving the maximum rate of ascent a >>>>> cluster can scale against sudden increase in load. >>>>> >>>>> It appears that there is no spawning state awareness, which also >>>>> means several ‘redundant’ instances get spawned, when instance startup >>>>> time >>>>> is greater than the scale decision interval. >>>>> >>>>> Finally: >>>>> >>>>> Are there difficulties in tracking ‘SPAWNED’ to ‘ACTIVE’ state on a >>>>> per cartridge basis, how does this align (if its a valid enhancement) with >>>>> other potential improvements that could be made to the autoscaler? >>>>> >>>>> Regards, >>>>> >>>>> Mike >>>>> >>>>> From: Imesh Gunaratne <[email protected]> >>>>> Reply-To: "[email protected]" <[email protected]> >>>>> Date: Thursday, 12 February 2015 18:16 >>>>> To: dev <[email protected]> >>>>> Subject: Re: autoscale architecture >>>>> >>>>> Hi Michael, >>>>> >>>>> Yes you can ask any questions you have on Autoscaling here. >>>>> >>>>> I don't think we have documented Autoscaling feature in 4.1.0 at the >>>>> moment. However you could find some information here [1]. Autoscaling has >>>>> slightly changed with Composite Application Model. >>>>> >>>>> [1] >>>>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Autoscaler >>>>> >>>>> Thanks >>>>> >>>>> On Thu, Feb 12, 2015 at 9:33 PM, Michael Hall (michaha2) < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Devs, >>>>>> >>>>>> Is there a resource or contact that can help me understand the >>>>>> current, and planned architecture of the autoscaling feature within >>>>>> Stratos. >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Mike >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Imesh Gunaratne >>>>> >>>>> Technical Lead, WSO2 >>>>> Committer & PMC Member, Apache Stratos >>>>> >>>> >>>> >>>> >>>> -- >>>> Imesh Gunaratne >>>> >>>> Technical Lead, WSO2 >>>> Committer & PMC Member, Apache Stratos >>>> >>> >>> >>> >>> -- >>> Lakmal Warusawithana >>> Vice President, Apache Stratos >>> Director - Cloud Architecture; WSO2 Inc. >>> Mobile : +94714289692 >>> Blog : http://lakmalsview.blogspot.com/ >>> >>> >> >> >> -- >> Lakmal Warusawithana >> Vice President, Apache Stratos >> Director - Cloud Architecture; WSO2 Inc. >> Mobile : +94714289692 >> Blog : http://lakmalsview.blogspot.com/ >> >> > > > -- > -- > Lahiru Sandaruwan > Committer and PMC member, Apache Stratos, > Senior Software Engineer, > WSO2 Inc., http://wso2.com > lean.enterprise.middleware > > email: [email protected] blog: http://lahiruwrites.blogspot.com/ > linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146 > > -- -- Lahiru Sandaruwan Committer and PMC member, Apache Stratos, Senior Software Engineer, WSO2 Inc., http://wso2.com lean.enterprise.middleware email: [email protected] blog: http://lahiruwrites.blogspot.com/ linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
