[jira] [Commented] (STORM-898) Add priorities and per user resource guarantees to Resource Aware Scheduler

ASF GitHub Bot (JIRA) Mon, 07 Dec 2015 17:01:38 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046096#comment-15046096
 ]


ASF GitHub Bot commented on STORM-898:
--------------------------------------

Github user d2r commented on a diff in the pull request:

    https://github.com/apache/storm/pull/921#discussion_r46902477
  
    --- Diff: docs/documentation/Resource_Aware_Scheduler_overview.md ---
    @@ -0,0 +1,222 @@
    +# Introduction
    +
    +The purpose of this document is to provide a description of the Resource 
Aware Scheduler for the Storm distributed real-time computation system.  This 
document will provide you with both a high level description of the resource 
aware scheduler in Storm
    +
    +## Using Resource Aware Scheduler
    +
    +The user can switch to using the Resource Aware Scheduler by setting the 
following in *conf/storm.yaml*
    +
    +    storm.scheduler: 
“backtype.storm.scheduler.resource.ResourceAwareScheduler”
    +
    +
    +## API Overview
    +
    +For a Storm Topology, the user can now specify the amount of resources a 
topology component (i.e. Spout or Bolt) is required to run a single instance of 
the component.  The user can specify the resource requirement for a topology 
component by using the following API calls.
    +
    +### Setting Memory Requirement
    +
    +API to set component memory requirement:
    +
    +    public T setMemoryLoad(Number onHeap, Number offHeap)
    +
    +Parameters:
    +* Number onHeap – The amount of on heap memory an instance of this 
component will consume in megabytes
    +* Number OffHeap – The amount of off heap memory an instance of this 
component will consume in megabytes
    +
    +The user also have to option to just specify the on heap memory 
requirement if the component does not have an off heap memory need.
    +
    +    public T setMemoryLoad(Number onHeap)
    +
    +Parameters:
    +* Number onHeap – The amount of on heap memory an instance of this 
component will consume
    +
    +If no value is provided for offHeap, 0.0 will be used. If no value is 
provided for onHeap, or if the API is never called for a component, the default 
value will be used.
    +
    +Example of Usage:
    +
    +    SpoutDeclarer s1 = builder.setSpout("word", new TestWordSpout(), 10);
    +    s1.setMemoryLoad(1024.0, 512.0);
    +    builder.setBolt("exclaim1", new ExclamationBolt(), 3)
    +                .shuffleGrouping("word").setMemoryLoad(512.0);
    +
    +The entire memory requested for this topology is 16.5 GB. That is from 10 
spouts with 1GB on heap memory and 0.5 GB off heap memory each and 3 bolts with 
0.5 GB on heap memory each.
    +
    +### Setting CPU Requirement
    +
    +
    +API to set component CPU requirement:
    +
    +    public T setCPULoad(Double amount)
    +
    +Parameters:
    +* Number amount – The amount of on CPU an instance of this component will 
consume.
    +
    +Currently, the amount of CPU resources a component requires or is 
available on a node is represented by a point system. CPU usage is a difficult 
concept to define. Different CPU architectures perform differently depending on 
the task at hand. They are so complex that expressing all of that in a single 
precise portable number is impossible. Instead we take a convention over 
configuration approach and are primarily concerned with rough level of CPU 
usage while still providing the possibility to specify amounts more fine 
grained.
    +
    +By convention a CPU core typically will get 100 points. If you feel that 
your processors are more or less powerful you can adjust this accordingly. 
Heavy tasks that are CPU bound will get 100 points, as they can consume an 
entire core. Medium tasks should get 50, light tasks 25, and tiny tasks 10. In 
some cases you have a task that spawns other threads to help with processing. 
These tasks may need to go above 100 points to express the amount of CPU they 
are using. If these conventions are followed the common case for a single 
threaded task the reported Capacity * 100 should be the number of CPU points 
that the task needs.
    +
    +Example of Usage:
    +
    +    SpoutDeclarer s1 = builder.setSpout("word", new TestWordSpout(), 10);
    +    s1.setCPULoad(15.0);
    +    builder.setBolt("exclaim1", new ExclamationBolt(), 3)
    +                .shuffleGrouping("word").setCPULoad(10.0);
    +
    +###        Limiting the Heap Size per Worker (JVM) Process
    +
    +
    +    public void setTopologyWorkerMaxHeapSize(Number size)
    +
    +Parameters:
    +* Number size – The memory limit a worker process will be allocated in 
megabytes
    +
    +The user can limit the amount of memory resources the resource aware 
scheduler that is allocated to a single worker on a per topology basis by using 
the above API.  This API is in place so that the users can spread executors to 
multiple workers.  However, spreading workers to multiple workers may increase 
the communication latency since executors will not be able to use Disruptor 
Queue for intra-process communication.
    +
    +Example of Usage:
    +
    +    Config conf = new Config();
    +    conf.setTopologyWorkerMaxHeapSize(512.0);
    +
    +### Setting Available Resources on Node
    +
    +A storm administrator can specify node resource availability by modifying 
the *conf/storm.yaml* file located in the storm home directory of that node.
    +
    +A storm administrator can specify how much available memory a node has in 
megabytes adding the following to *storm.yaml*
    +
    +    supervisor.memory.capacity.mb: [amount<Double>]
    +
    +A storm administrator can also specify how much available CPU resources a 
node has available adding the following to *storm.yaml*
    +
    +    supervisor.cpu.capacity: [amount<Double>]
    +
    +
    +Note: that the amount the user can specify for the available CPU is 
represented using a point system like discussed earlier.
    +
    +Example of Usage:
    +
    +    supervisor.memory.capacity.mb: 20480.0
    +    supervisor.cpu.capacity: 100.0
    +
    +
    +2.5.       Other Configurations
    +
    +The user can set some default configurations for the Resource Aware 
Scheduler in *conf/storm.yaml*:
    +
    +    //default value if on heap memory requirement is not specified for a 
component 
    +    topology.component.resources.onheap.memory.mb: 128.0
    +
    +    //default value if off heap memory requirement is not specified for a 
component 
    +    topology.component.resources.offheap.memory.mb: 0.0
    +
    +    //default value if CPU requirement is not specified for a component 
    +    topology.component.cpu.pcore.percent: 10.0
    +
    +    //default value for the max heap size for a worker  
    +    topology.worker.max.heap.size.mb: 768.0
    +
    +# Topology Priorities and Per User Resource 
    +
    +The next step for the Resource Aware Scheduler or RAS is to enable it to 
have multitenant capabilities since many Storm users typically share a Storm 
cluster.  Resource Aware Scheduler needs to be able to allocate resources on a 
per user basis.  Each user can be guaranteed a certain amount of resources to 
run his or her topologies and the Resource Aware Scheduler should meet those 
guarantees when possible.  When the Storm cluster has extra free resources, 
Resource Aware Scheduler needs to be able allocate additional resources to user 
in a fair manner. The importance of topologies can also vary.  Topologies can 
be used for actual production or just experimentation, thus Resource Aware 
Scheduler should take into account the importance of a topology when 
determining the order in which to schedule topologies or when to evict 
topologies
    +
    +## Setup
    +
    +The resource guarantees of a user can be specified 
*conf/user-resource-pools.yaml*.  Specify the resource guarantees of a user in 
the following format:
    +
    +    resource.aware.scheduler.user.pools:
    +   [UserId]
    +           cpu: [Amount of Guarantee CPU Resources]
    +           memory: [Amount of Guarantee Memory Resources]
    +
    +An example of what *user-resource-pools.yaml* can look like:
    +
    +    resource.aware.scheduler.user.pools:
    +        jerry:
    +            cpu: 1000
    +            memory: 8192.0
    +        derek:
    +            cpu: 10000.0
    +            memory: 32768
    +        bobby:
    +            cpu: 5000.0
    +            memory: 16384.0
    +
    +Please note that the specified amount of Guaranteed CPU and Memory can be 
either a integer or double
    +
    +## API Overview
    +### Specifying topology priority
    +The range of topology priorities can range form 0-30.  The topologies 
priorities will be partitioned into several priority levels that may contain a 
range of priorities. 
    +For example we can create a priority level mapping:
    +
    +    PRODUCTION => 0 – 9
    +    STAGING => 10 – 19
    +    DEV => 20 – 29
    +
    +Thus, each priority level contains 10 sub priorities. Users can set the 
priority level of a topology by using the following API
    +
    +    conf.setTopologyPriority(int priority)
    +Parameters:
    +* priority – an integer representing the priority of the topology
    +
    +### Specifying Scheduling Strategy:
    +
    +A user can specify on a per topology basis what scheduling strategy to 
use.  Users can implement the IStrategy interface and define a new strategies 
to schedule specific topologies.  This pluggable interface was created since we 
realize different topologies might have different scheduling needs.  A user can 
set the topology strategy within the topology definition by using the API:
    +
    +    public void setTopologyStrategy(Class<? extends IStrategy> clazz)
    +    
    +Parameters:
    +* clazz – The strategy class that implements the IStrategy interface
    +
    +Example Usage:
    +    
conf.setTopologyStrategy(backtype.storm.scheduler.resource.strategies.scheduling.DefaultResourceAwareStrategy.class);
    +
    +A default scheduling is provided.  The DefaultResourceAwareStrategy is 
implemented based off the scheduling algorithm in the original paper describing 
resource aware scheduling in Storm:
    +
    +http://web.engr.illinois.edu/~bpeng/files/r-storm.pdf
    +
    +### Specifying Topology Prioritization Strategy
    +
    +The order of scheduling is a pluggable interface in which a user could 
define a strategy that prioritizes topologies.  For a user to define his or her 
own prioritization strategy, he or she needs to implement the 
ISchedulingPriorityStrategy interface.  A user can set the scheduling priority 
strategy by setting the *Config.RESOURCE_AWARE_SCHEDULER_PRIORITY_STRATEGY* to 
point to the class that implements the strategy. For instance:
    +
    +    resource.aware.scheduler.priority.strategy: 
"backtype.storm.scheduler.resource.strategies.priority.DefaultSchedulingPriorityStrategy"
    +    
    +    A default strategy will be provided.  The following explains how the 
default scheduling priority strategy works.
    +
    +**DefaultSchedulingPriorityStrategy**
    +
    +The order of scheduling should be based on how far is a user’s current 
resource allocation to his or her guaranteed allocation.  We should prioritize 
the users who are the furthest away from their resource guarantee. The 
difficulty of this problem is that a user may have multiple resource guarantees 
and another user can have another set resource guarantees, so how can we 
compare them in a fair manner?  Lets use the average percentage of resource 
guarantees satisfied as a method of comparison.
    +
    +For example:
    +
    +|User|Resource Guarantee|Resource Allocated|
    +|----|------------------|------------------|
    +|A|<10 CPU, 50GB>|<2 CPU, 40 GB>|
    +|B|< 20 CPU, 25GB>|<15 CPU, 10 GB>|
    +
    +User A’s average percentage satisfied of resource guarantee: 
    +
    +(2/10+40/50)/2  = 0.5
    +
    +User B’s average percentage satisfied of resource guarantee: 
    +
    +(15/20+10/25)/2  = 0.575
    +
    +Thus, in this example User A has a smaller average percentage of his or 
her resource guarantee satisfied than User B.  Thus, User A should get priority 
to be allocated more resource, i.e., schedule a topology submitted by User A.
    +
    +When scheduling, RAS sort users by the average percentage satisfied of 
resource guarantee and schedule topologies from users based on that ordering 
starting from the users with the lowest average percentage satisfied of 
resource guarantee.  When a user’s resource guarantee is completely satisfied, 
the user’s average percentage satisfied of resource guarantee will be greater 
than or equal to 1.
    +
    +### Specifying Eviction Strategy
    +The strategy for evicting topologies is also a pluggable interface in 
which the user can implement his or her own topology eviction strategy.  For a 
user to implement his or her own eviction strategy, he or she needs to 
implement the IEvictionStrategy Interface and set 
*Config.RESOURCE_AWARE_SCHEDULER_EVICTION_STRATEGY* to point to the implemented 
strategy class. For instance:
    +
    +    resource.aware.scheduler.eviction.strategy: 
"backtype.storm.scheduler.resource.strategies.eviction.DefaultEvictionStrategy"
    +
    +A default eviction strategy is provided.  The following explains how the 
default topology eviction strategy works
    +
    +**DefaultEvictionStrategy**
    +
    +If the cluster is full, we need a mechanism to evict topologies so that 
user resource guarantees can be met and resources additional resource 
guarantees can be shared fairly among users
    --- End diff --
    
    `and resources` -> `and`


> Add priorities and per user resource guarantees to Resource Aware Scheduler
> ---------------------------------------------------------------------------
>
>                 Key: STORM-898
>                 URL: https://issues.apache.org/jira/browse/STORM-898
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: Robert Joseph Evans
>            Assignee: Boyang Jerry Peng
>         Attachments: Resource Aware Scheduler for Storm.pdf
>
>
> In a multi-tenant environment we would like to be able to give individual 
> users a guarantee of how much CPU/Memory/Network they will be able to use in 
> a cluster.  We would also like to know which topologies a user feels are the 
> most important to keep running if there are not enough resources to run all 
> of their topologies.
> Each user should be able to specify if their topology is production, staging, 
> or development. Within each of those categories a user should be able to give 
> a topology a priority, 0 to 10 with 10 being the highest priority (or 
> something like this).
> If there are not enough resources on a cluster to run a topology assume this 
> topology is running using resources and find the user that is most over their 
> guaranteed resources.  Shoot the lowest priority topology for that user, and 
> repeat until, this topology is able to run, or this topology would be the one 
> shot.   Ideally we don't actually shoot anything until we know that we would 
> have made enough room.
> If the cluster is over-subscribed and everyone is under their guarantee, and 
> this topology would not put the user over their guarantee.  Shoot the lowest 
> priority topology in this workers resource pool until there is enough room to 
> run the topology or this topology is the one that would be shot.  We might 
> also want to think about what to do if we are going to shoot a production 
> topology in an oversubscribed case, and perhaps we can shoot a non-production 
> topology instead even if the other user is not over their guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-898) Add priorities and per user resource guarantees to Resource Aware Scheduler

Reply via email to