[ 
https://issues.apache.org/jira/browse/STORM-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643117#comment-14643117
 ] 

Jon Weygandt commented on STORM-594:
------------------------------------

As I wrote this, I realize that my idea of auto-scaling could be different than 
others. It is to ensure that I have enough capacity on "average" to handle the 
load. Others may like to optimize on overall system latency, which would have a 
different algorithm. This auto-scale change will be a plugin, so one can try to 
optimize for this case with another implementation should ultra-low latency be 
a concern.

On latency, capacity and input queue depth of a bolt with respect to 
auto-scaling, which is different than total system measures. 

Latency is the time it take the bolt to process a Tuple. Short of a code 
change, or the dependent output system change (e.g. Kafka output bolt), the 
latency will remain constant as load changes. Given latency maximum throughput 
of the bolt is known, and it generally remains constant. For CPU bound bolts, 
provided you don't overload the system, the system will scale linearly. 
Similarly for a Kafka bolt, provided you don't overload Kafka.  Even when the 
bolt is running at 100% you can figure out the throughput. It may not be 
perfect, but it is fairly good for the systems I've worked with. 

Queue depth is very different and really interesting, but it is a function of 
many things. It is also erratic and difficult to properly measure. Generally, 
if the system is not overloaded, the instantaneous queue depth will be zero 
quite often. If the inbound load was perfectly constant, in theory, the queue 
depth would either be 1 or 0, never greater than 1. When the inbound load is 
erratic (e.g. it bursts), the queue fills up, but a bursty inbound load also 
has the opposite side, a low rate, and this allows the system time to drain the 
queue. What happens when the queue fills up? Hopefully no one builds an 
unbounded queue! OOM! it will either need to overflow and drop data, or apply 
backpressure upstream. In the last system I worked on the messaging system did 
not allow backpressure, and when and individual partition was running around 
400K/sec we could have queue depths of 100K. But they would always return to 
zero, as the downstream components were able to handle the average load. As 
long as the queue depth fluctuates between zero and some number you have a 
system that is not overloaded. Even with backpressure we would still see this 
same pattern. This is why I find this metric difficult to work with for 
auto-scaling. Further, to measure queue depth, sampling the depth periodically 
is difficult. You would need a large number of samples to state that the queue 
was “at zero 2% of the time”. In my prior system, max queue depth was extremely 
important for me, and I instrumented the input side of the queue to record the 
maximum depth with each put, and reset it each sampling of the metric.

Now some will be concerned about overall system latency, which is generally not 
my concern. It would only be of concern if someone said "99% of the data is 
delivered in 100ms or less". If your latency is measured in seconds, I would 
say throughput is the primary goal. If you don't overload, latency is generally 
driven by the speed of your algorithms, and bursting. If you think you have 
sub-sub second latencies as a requirement, then bursting, GC, backpressure and 
acknowledgment mechanisms will be of concern as well. 

> Auto-Scaling Resources in a Topology
> ------------------------------------
>
>                 Key: STORM-594
>                 URL: https://issues.apache.org/jira/browse/STORM-594
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: HARSHA BALASUBRAMANIAN
>            Assignee: Pooyan Jamshidi
>            Priority: Minor
>         Attachments: Algorithm for Auto-Scaling.pdf, Project Plan and 
> Scope.pdf
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> A useful feature missing in Storm topologies is the ability to auto-scale 
> resources, based on a pre-configured metric. The feature proposed here aims 
> to build such a auto-scaling mechanism using a feedback system. A brief 
> overview of the feature is provided here. The finer details of the required 
> components and the scaling algorithm (uses a Feedback System) are provided in 
> the PDFs attached.
> Brief Overview:
> Topologies may get created with or (ideally) without parallelism hints and 
> tasks in their bolts and spouts, before submitting them, If auto-scaling is 
> set in the topology (using a Boolean flag), the topology will also get 
> submitted to the auto-scale module.
> The auto-scale module will read a pre-configured metric (threshold/min) from 
> a configuration file. Using this value, the topology's resources will be 
> modified till the threshold is reached. At each stage in the auto-scale 
> module's execution, feedback from the previous execution will be used to tune 
> the resources.
> The systems that need to be in place to achieve this are:
> 1. Metrics which provide the current threshold (no: of acks per minute) for a 
> topology's spouts and bolts.
> 2. Access to Storm's CLI tool which can change a topology's resources are 
> runtime.
> 3. A new java or clojure module which runs within the Nimbus daemon or in 
> parallel to it. This will be the auto-scale module.
> Limitations: (This is not an exhaustive list. More will be added as the 
> design matures. Also, some of the points here may get resolved)
> To test the feature there will be a number of limitations in the first 
> release. As the feature matures, it will be allowed to scale more
> 1. The auto-scale module will be limited to a few topologies (maybe 4 or 5 at 
> maximum)
> 2. New bolts will not be added to scale a topology. This feature will be 
> limited to increasing the resources within the existing topology.
> 3. Topology resources will not be decreased when it is running at more than 
> the required number (except for a few cases)
> 4. This feature will work only for long-running topologies where the input 
> threshold can become equal to or greater than the required threshold



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to