Re: Autoscale proposal - transition compensated continuous scaling

Michael Hall (michaha2) Thu, 26 Feb 2015 07:35:29 -0800

Hi Lahiru,

I’ve read up a bit of material on metric prediction/tracking and yes, I think 
in terms of metric prediction, curve fitting seems like an established 
technique, 
http://thateye.org/Load_Prediction_and_Hot_Spot_Detection_Models_for_Autonomic_Cloud_Computing.pdf,
 although cubic spline a.k.a 3rd degree polynomial filter is recommended here.


In terms of metric tracking/filtering, were you thinking of using moving ave 
alone? The afore mentioned paper mentions EMA (Exponential Moving Average). It 
seems to have better performance and be computationally efficient.

I think separating the problem into tracking and prediction is important as 
architecturally speaking (as well as functionally speaking), tracking/filtering 
and prediction are two different beasts.

1) Tracking/Filtering should be owned, or at least ’tuned' by it's metric type 
(cpu load, memory consumption) and adjusted to suit its’ metrics 
characteristics accordingly (An example would be that each metric type defines 
it's own EMA smoothing configuration). You could even envision that it’s really 
the metric’s responsibility to send already ’tracked/filtered’ values from 
source, or at least define it’s own filter() method to be applied to it’s raw 
measurements.

2) Prediction can be done on any subsequent ‘tracked/filtered’ data set and is 
definitely a part of the autoscale function, we only need one predict(dataset) 
method.

It’s certainly an interesting topic, I wonder if eventually we’d try to model 
these algorithms for fast prototyping/comparison?

Thanks for wanting my thoughts on the matter, I did actually spend some time 
thinking about this topic a few weeks ago, and have some slides on other 
thoughts as well. Please let me know if you’d be interested in looking through 
those.

Mike


From: Lahiru Sandaruwan <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, 26 February 2015 10:13
To: dev <[email protected]<mailto:[email protected]>>
Cc: Lakmal Warusawithana <[email protected]<mailto:[email protected]>>, Imesh 
Gunaratne <[email protected]<mailto:[email protected]>>
Subject: Re: Autoscale proposal - transition compensated continuous scaling

Hi Michael,

Interesting analysis.

We have had some discussions regarding the prediction/ regression improvement 
in dev list.

It was based around introducing curve fitting, rather than using separate "ave 
+ ave grad + ave 2nd grad", what we have now.

The discussion was under the subject "[Autoscaling] [Improvement] Introducing 
"curve fitting" for stat prediction algorithm of Autoscaler".

Could you have a look at that discussion as well. Also i have filled the same 
suggestion in a Jira [1].

Thanks.

[1] https://issues.apache.org/jira/browse/STRATOS-1211

On Fri, Feb 13, 2015 at 9:37 PM, Michael Hall (michaha2) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Imesh, Lakmal, Devs,

Following the attached email thread, this email is intended as a starting point 
in formalising the proposal of the proceeding autoscale enhancement.

Kind regards,

Mike…

Proposal for ‘Transition Compensated Continuous Scaling’ enhancement to be 
added to Apache Stratos Autoscale feature to:

  1.  Greatly improve (~ x100) the maximum rate of cluster size increase 
(maximum rate of ascent), when subjected to a sudden increase in load. 
(continuous scaling decisions can occur as the decision isn’t delayed (cluster 
monitor interval) to wait for the system to tend toward a steady state)
  2.   Eliminate redundant cartridges being spawned/terminated because of 
cartridge startup/stop being larger than a scaling decision interval (cluster 
monitor interval)

Implementation Overview:

Current:

Measured health statistic -> sent to CEP -> 1 minute average -> forward 
prediction (use ave + ave grad + ave 2nd grad) -> use autoscale policy to calc 
number of required cartridges -> compare required cartridge count, to current 
cartridge found and scale appropriately

Proposed:

Measured health statistic -> sent to CEP -> 1 minute‘moving’average (per 
second) -> forward prediction (use ave + ave grad + ave 2nd grad) -> use 
autoscale policy to calc number of required cartridges -> compare required 
cartridge count, to ( the active (current) cartridge count + the spawning 
cartridge count – the terminating cartridge count ) and scale appropriately

Implementation of ‘spawning’/‘terminating’ cartridge count:

Currently the autoscale feature is not aware of the amount of cartridges in the 
cluster that are transitioning to and from the ACTIVE state. The proposed 
enhancement relies on being able to know this count at any given moment in time.

This can be implemented by using asynchronous events, where:

‘MEMBER SPAWNED EVENT’ -> increments cluster-cartridge-count-spawned
‘MEMBER ACTIVE EVENT’ -> decrements cluster-cartridge-count-spawned, and 
increments cluster-cartridge-count-active
‘MEMBER TERMINATING EVENT’ -> increments cluster-cartridge-count-terminating
‘MEMBER TERMINATED EVENT’ -> decrements cluster-cartridge-count-terminating, 
and decrements cluster-cartridge-count-active

Summary

By compensating the ‘current’ cartridge count/ cluster size, with the 
cartridges that are transitioning, we remove the issue of duplicating scaling 
decisions whilst also allowing the scaling decision to occur continuously, 
greatly improving our ‘maximum rate of ascent’ when scaling up our cluster in 
reaction to a sudden increase in load.



From: Michael Hall <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, 13 February 2015 11:36
To: Lakmal Warusawithana <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Imesh Gunaratne 
<[email protected]<mailto:[email protected]>>
Subject: Re: autoscale architecture

That’s a good plan,

My work number is +442088242650<tel:%2B442088242650>

I’m around now, but will break for a while for lunch in an hour or so.

Cheers

From: Lakmal Warusawithana <[email protected]<mailto:[email protected]>>
Date: Friday, 13 February 2015 11:24
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, Imesh Gunaratne 
<[email protected]<mailto:[email protected]>>, Michael Hall 
<[email protected]<mailto:[email protected]>>
Subject: Re: autoscale architecture

Shall we go for a call, it will be more productive.

On Fri, Feb 13, 2015 at 4:45 PM, Lakmal Warusawithana 
<[email protected]<mailto:[email protected]>> wrote:
Hi Michael

On Fri, Feb 13, 2015 at 4:14 PM, Michael Hall (michaha2) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Imesh,

So ‘transistion compensated’ refers to cartridges, which are ’transistioning’ 
between SPAWNED-ACTIVE, and TERMINATING-TERMINATED.

What it really means, is that if the 'aggregated average’ (Referred to this as 
<metric>PredictedValue in scaling.drl) is compensated:

  1.  As if the ‘spawning’ cartridges are providing resouce (although they 
aren’t yet)
  2.  As if the ‘terminating’ cartridges have removed resource (although they 
haven't yet)

Such that the ‘transition compensated aggregated average', will be 
approximately what the actually aggregated average would be if those cartridges 
had become fully ‘active’ or ‘terminated’. This means the ‘transition 
compensated aggregated average’ is always in a sensible state to make a scaling 
decision.

This then allows us to make a scaling decision as often as we’d like (much 
smaller than 90 seconds, could even be every 1 second), because if you take the 
example the we’ve scaled up, the 'transition compensated aggregated average’ 
will instantly adjust to N/N+1 of it’s raw value (copied formula from previous 
email for reference below), so another scaling decision will only occur, if the 
underlying load (aggregated average) increases even further.

transistion-compensated-agg-ave = agg-ave * ( cluster-size / cluster-size +  
cluster-spawned-size - cluster–terminating-size )


 I think this is good proposal, definitely it will help to calculate more 
accurate agg-ave values. Since CEP has the topology information we can easily 
calculate this.

AFAIK, auto scaler take care of cartridge states when calculating required 
instances count for a predicted load.


I’d be more than happy to setup a webex meeting to try and explain this better? 
Or another avenue of communication at your preference?

Kind regards,

Mike

From: Imesh Gunaratne <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, 13 February 2015 01:09

To: dev <[email protected]<mailto:[email protected]>>
Subject: Re: autoscale architecture

Hi Mike,

Thanks for the detailed explanation of your question. Currently we do not have 
the capability to do this in runtime for a specific cartridge. However we could 
reduce the global scaling decision interval. This needs to be configured at 
three locations:

1. Cartridge agent statistics publishing interval (default: 15 seconds)
2. CEP execution plan/faulty member detection interval (default: 1 min)
3. Autoscaler cluster monitor interval (default: 90 seconds)

I did not clearly get what you mean by 'transition compensated'. Is there a way 
to explain it further?

Thanks


On Fri, Feb 13, 2015 at 12:26 AM, Michael Hall (michaha2) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Dev,

Thanks for your response Imesh, if its ok, I’d like to skip straight to my 
(rather lengthy) question:

Does the autoscaler have, currently or plans to introduce, a means to receive 
an asynchronous event, signalling that a cartridge has gone from ‘SPAWNED’ to 
‘ACTIVE’, after it is launched from a 'scale-up’ decision, so that, scaling 
decision interval can decrease to approximately the metric update interval, and 
multiple cartridges are not spawned when only one is needed?

In more depth:

The reasons for my question being that by knowing a cartridge is in the 
‘SPAWNED’ or ’TERMINATING’ state, the aggregated metric averages can be 
’transition compensated’ I.e…
transistion-compensated-agg-ave = agg-ave * ( cluster-size / cluster-size +  
cluster-spawned-size - cluster–terminating-size )
To allow the scaling decisions to occur on a continuous (only throttled by the 
metric update frequency) basis.

It appears that currently scaling decision occurs ~minutes. If this becomes 
~seconds, it would vastly improving the maximum rate of ascent a cluster can 
scale against sudden increase in load.

It appears that there is no spawning state awareness, which also means several 
‘redundant’ instances get spawned, when instance startup time is greater than 
the scale decision interval.

Finally:

Are there difficulties in tracking ‘SPAWNED’ to ‘ACTIVE’ state on a per 
cartridge basis, how does this align (if its a valid enhancement) with other 
potential improvements that could be made to the autoscaler?

Regards,

Mike

From: Imesh Gunaratne <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, 12 February 2015 18:16
To: dev <[email protected]<mailto:[email protected]>>
Subject: Re: autoscale architecture

Hi Michael,

Yes you can ask any questions you have on Autoscaling here.

I don't think we have documented Autoscaling feature in 4.1.0 at the moment. 
However you could find some information here [1]. Autoscaling has slightly 
changed with Composite Application Model.

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Autoscaler

Thanks

On Thu, Feb 12, 2015 at 9:33 PM, Michael Hall (michaha2) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Devs,

Is there a resource or contact that can help me understand the current, and 
planned architecture of the autoscaling feature within Stratos.

Best Regards,

Mike



--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Lakmal Warusawithana
Vice President, Apache Stratos
Director - Cloud Architecture; WSO2 Inc.
Mobile : +94714289692<tel:%2B94714289692>
Blog : http://lakmalsview.blogspot.com/




--
Lakmal Warusawithana
Vice President, Apache Stratos
Director - Cloud Architecture; WSO2 Inc.
Mobile : +94714289692<tel:%2B94714289692>
Blog : http://lakmalsview.blogspot.com/




--
--
Lahiru Sandaruwan
Committer and PMC member, Apache Stratos,
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

email: [email protected]<mailto:[email protected]> blog: 
http://lahiruwrites.blogspot.com/
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146

Re: Autoscale proposal - transition compensated continuous scaling

Reply via email to