[
https://issues.apache.org/jira/browse/STORM-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim updated STORM-1766:
--------------------------------
Fix Version/s: 2.0.0
> A better algorithm server rack selection for RAS
> ------------------------------------------------
>
> Key: STORM-1766
> URL: https://issues.apache.org/jira/browse/STORM-1766
> Project: Apache Storm
> Issue Type: Improvement
> Reporter: Boyang Jerry Peng
> Assignee: Boyang Jerry Peng
> Fix For: 2.0.0
>
>
> Currently the getBestClustering algorithm for RAS finds the "Best"
> cluster/rack based on which rack has the most available resources this may be
> insufficient and may cause topologies not to be able to be scheduled
> successfully even though there are enough resources to schedule it in the
> cluster. We attempt to find the rack with the most resources by find the rack
> with the biggest sum of available memory + available cpu. This method is not
> effective since it does not consider the number of slots available. This
> method also fails in identifying racks that are not schedulable due to the
> exhaustion of one of the resources either memory, cpu, or slots. The current
> implementation also tries the initial scheduling on one rack and not try to
> schedule on all the racks before giving up which may cause topologies to be
> failed to be scheduled due to the above mentioned shortcomings in the current
> method. Also the current method does not consider failures of workers. When
> executors of a topology gets unassigned and needs to be scheduled again, the
> current logic in getBestClustering may be inadequate if not complete wrong.
> When executors needs to rescheduled due to a fault, getBestClustering will
> likely return a cluster that is different from where the majority of
> executors from the topology is originally scheduling in.
> Thus, I propose a different strategy/algorithm to find the "best" cluster. I
> have come up with a ordering strategy I dub subordinate resource availability
> ordering (inspired by Dominant Resource Fairness) that sorts racks by the
> subordinate (not dominant) resource availability.
> For example given 4 racks with the following resource availabilities
> {code}
> //generate some that has alot of memory but little of cpu
> rack-3 Avail [ CPU 100.0 MEM 200000.0 Slots 40 ] Total [ CPU 100.0 MEM
> 200000.0 Slots 40 ]
> //generate some supervisors that are depleted of one resource
> rack-2 Avail [ CPU 0.0 MEM 80000.0 Slots 40 ] Total [ CPU 0.0 MEM 80000.0
> Slots 40 ]
> //generate some that has a lot of cpu but little of memory
> rack-4 Avail [ CPU 6100.0 MEM 10000.0 Slots 40 ] Total [ CPU 6100.0 MEM
> 10000.0 Slots 40 ]
> //generate another rack of supervisors with less resources than rack-0
> rack-1 Avail [ CPU 2000.0 MEM 40000.0 Slots 40 ] Total [ CPU 2000.0 MEM
> 40000.0 Slots 40 ]
> rack-0 Avail [ CPU 4000.0 MEM 80000.0 Slots 40( ] Total [ CPU 4000.0 MEM
> 80000.0 Slots 40 ]
> Cluster Overall Avail [ CPU 12200.0 MEM 410000.0 Slots 200 ] Total [ CPU
> 12200.0 MEM 410000.0 Slots 200 ]
> {code}
> It is clear that rack-0 is the best cluster since its the most balanced and
> can potentially schedule the most executors, while rack-2 is the worst rack
> since rack-2 is depleted of cpu resource thus rendering it unschedulable even
> though there are other resources available.
> We first calculate the resource availability percentage of all the racks for
> each resource by computing:
> {code}
> (resource available on rack) / (resource available in cluster)
> {code}
> We do this calculation to normalize the values otherwise the resource values
> would not be comparable.
> So for our example:
> {code}
> rack-3 Avail [ CPU 0.819672131147541% MEM 48.78048780487805% Slots 20.0% ]
> effective resources: 0.00819672131147541
> rack-2 Avail [ 0.0% MEM 19.51219512195122% Slots 20.0% ] effective resources:
> 0.0
> rack-4 Avail [ CPU 50.0% MEM 2.4390243902439024% Slots 20.0% ] effective
> resources: 0.024390243902439025
> rack-1 Avail [ CPU 16.39344262295082% MEM 9.75609756097561% Slots 20.0% ]
> effective resources: 0.0975609756097561
> rack-0 Avail [ CPU 32.78688524590164% MEM 19.51219512195122% Slots 20.0% ]
> effective resources: 0.1951219512195122
> {code}
> The effective resource of a rack, which is also the subordinate resource, is
> computed by:
> {code}
> MIN(resource availability percentage of {CPU, Memory, # of free Slots}).
> {code}
> Then we order the racks by the effective resource.
> Thus for our example:
> {code}
> Sorted rack: [rack-0, rack-1, rack-4, rack-3, rack-2]
> {code}
> Also to deal with the presence of failures, if a topology is partially
> scheduled, we find the rack with the most scheduled executors for the
> topology and we try to schedule on that rack first.
> Thus for the sorting for racks. We first sort by the number of executors
> already scheduled on the rack and then by the subordinate resource
> availability.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)