[jira] [Commented] (STORM-594) Auto-Scaling Resources in a Topology

Jon Weygandt (JIRA) Fri, 24 Jul 2015 08:30:20 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14640599#comment-14640599
 ]


Jon Weygandt commented on STORM-594:
------------------------------------

I believe it will be a long time before someone creates a single auto-scale 
solution that will suit all Storm users. My solution will have a "parallelism 
constraint", which does not seem bad, as is seems many users already behave 
this way. But I realize there will be others who don't like this constraint. To 
that end, I will attempt to create a layer whereby one can plug in different 
AutoScaleManagers (I'm going to propose a name like "AutoScaleManager", as 
Threshold Monitor Daemon seems to imply implementations). If we do it right, my 
implementation can coexist with UIUC's implementation, and others. I wish them 
well in their solution, my motivation is that I'm going to be responsible for 
100's of independent Topologies, and without something like this it will be an 
operational nightmare. 

I envision the AutoScaleManager will be associated with the Topology when it is 
submitted to Nimbus. Could be part of the core Storm code base, or a custom 
user class.

There will be one (or more) strategically placed calls from the Topology 
management code during the management lifecycle, such as during 
deployment/delivery of code to Workers, to modify/enhance the topology. It will 
be in this phase that saved history or initial defaults will be applied to the 
Topology, altering parallelism, workers, ackers, queue sizes etc... This call 
will be a "black box" to core Storm code. The "saved history" will be the sole 
responsibility of the AutoScaleManager.

Then the AutoScaleManager must run as a daemon somewhere. I'm going to propose 
that it run as a Thread inside Nimbus. This will make management of the daemon 
easy for simple use cases. And when Nimbus solves the HA/failover issues, the 
AutoScaleManager take advantage of that solution as well. Whenever Nimbus is 
watching over a Topology, the AutoScaleManager for that Topology is running. We 
will need various lifecycle methods such as start and stop that Nimbus will 
call at the correct times.

When the AutoScaleManager needs to make changes, all of the Nimbus APIs are 
available. It could do whatever runtime changes are allowed, like alter 
parallelism. For large changes it will simply stop then start the Topology, the 
"enhance" callback will perform the changes. Furthermore, if it is making 
runtime changes, it will need to remember them so when code changes happen, 
they are used as the initial defaults for that code roll.

> Auto-Scaling Resources in a Topology
> ------------------------------------
>
>                 Key: STORM-594
>                 URL: https://issues.apache.org/jira/browse/STORM-594
>             Project: Apache Storm
>          Issue Type: New Feature
>            Reporter: HARSHA BALASUBRAMANIAN
>            Assignee: Pooyan Jamshidi
>            Priority: Minor
>         Attachments: Algorithm for Auto-Scaling.pdf, Project Plan and 
> Scope.pdf
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> A useful feature missing in Storm topologies is the ability to auto-scale 
> resources, based on a pre-configured metric. The feature proposed here aims 
> to build such a auto-scaling mechanism using a feedback system. A brief 
> overview of the feature is provided here. The finer details of the required 
> components and the scaling algorithm (uses a Feedback System) are provided in 
> the PDFs attached.
> Brief Overview:
> Topologies may get created with or (ideally) without parallelism hints and 
> tasks in their bolts and spouts, before submitting them, If auto-scaling is 
> set in the topology (using a Boolean flag), the topology will also get 
> submitted to the auto-scale module.
> The auto-scale module will read a pre-configured metric (threshold/min) from 
> a configuration file. Using this value, the topology's resources will be 
> modified till the threshold is reached. At each stage in the auto-scale 
> module's execution, feedback from the previous execution will be used to tune 
> the resources.
> The systems that need to be in place to achieve this are:
> 1. Metrics which provide the current threshold (no: of acks per minute) for a 
> topology's spouts and bolts.
> 2. Access to Storm's CLI tool which can change a topology's resources are 
> runtime.
> 3. A new java or clojure module which runs within the Nimbus daemon or in 
> parallel to it. This will be the auto-scale module.
> Limitations: (This is not an exhaustive list. More will be added as the 
> design matures. Also, some of the points here may get resolved)
> To test the feature there will be a number of limitations in the first 
> release. As the feature matures, it will be allowed to scale more
> 1. The auto-scale module will be limited to a few topologies (maybe 4 or 5 at 
> maximum)
> 2. New bolts will not be added to scale a topology. This feature will be 
> limited to increasing the resources within the existing topology.
> 3. Topology resources will not be decreased when it is running at more than 
> the required number (except for a few cases)
> 4. This feature will work only for long-running topologies where the input 
> threshold can become equal to or greater than the required threshold



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-594) Auto-Scaling Resources in a Topology

Reply via email to