[PR] Feature: Add support for DRS in a Cluster [cloudstack]

via GitHub Mon, 23 Oct 2023 02:42:52 -0700


vishesh92 opened a new pull request, #7723:
URL: https://github.com/apache/cloudstack/pull/7723


   Documentation PR - 
https://github.com/apache/cloudstack-documentation/pull/334
   
   ### Description
   
   This pull request (PR) implements a Distributed Resource Scheduler (DRS) for 
a CloudStack cluster. The primary objective of this feature is to enable 
automatic resource optimization and workload balancing within the cluster by 
live migrating the VMs as per configuration.
   Administrators can also execute DRS manually for a cluster, using the UI or 
the API.
   Adds support for two algorithms - `condensed` & `balanced`. Algorithms are 
pluggable allowing ACS Administrators to have customized control over 
scheduling.
   
   **Implementation**
   There are three top level components:
   
   1. **Scheduler**
   A timer task which:
   * Generate DRS plan for clusters
   * Process DRS plan
   * Remove old DRS plan records
   
   2. **DRS Execution**
   We go through each VM in the cluster and use the specified algorithm to 
check if DRS is required and to calculate `cost`, `benefit` & `improvement` of 
migrating that VM to another host in the cluster. On the basis of `cost`, 
`benefit` & `improvement`, the best migration is selected for the current 
iteration and the VM is migrated. The maximum number of iterations (live 
migrations) possible on the cluster is defined by `drs.iterations` which is 
defined as a percentage (as a value between 0 and 1) of total number of 
workloads.
   
   3. **Algorithm** 
   Every algorithms implements two methods: 
       1.  `needsDrs` - to check if drs is required for cluster
       2.  `getMetrics` - to calculate `cost`, `benefit` & `improvement` of a 
migrating a VM to another host.
   
   **Algorithms**
   1. `Condensed` - Packs all the VMs on minimum number of hosts in the cluster.
   2. `Balanced` - Distributes the VMs evenly across hosts in the cluster. 
   Algorithms use `drs.level` to decide the amount of imbalance to allow in the 
cluster.
   
   
   #### APIs Added
   _listClusterDrsPlan_
   > `id` - ID of the DRS plan to list
   > `clusterid` - to list plans for a cluster id
   
   _generateClusterDrsPlan_
   >`id` - cluster id
   >`iterations` - The maximum number of iterations in a DRS job defined as a 
percentage (as a value between 0 and 1) of total number of workloads. Defaults 
to value of cluster's `drs.iterations` setting.
   
   _executeClusterDrsPlan_
   > `id` - ID of the cluster for which DRS plan is to be executed.
   > `migrateto` - This parameter specifies the mapping between a vm and a host 
to migrate that VM. Format of this parameter: 
`migrateto[vm-index].vm=<uuid>&migrateto[vm-index].host=<uuid>`.
   
   #### Config Keys Added
    - ClusterDrsPlanExpireInterval
   **Key** `drs.plan.expire.interval`
   **Scope** `Global`
   **Default Value** `30` days
   **Description** The interval in days after which old DRS records will be 
cleaned up.
   
    - ClusterDrsEnabled
   **Key** `drs.automatic.enable`
   **Scope** `Cluster`
   **Default Value** `false`
   **Description** Enable/disable automatic DRS on a cluster.
   
    - ClusterDrsInterval
   **Key** `drs.automatic.interval`
   **Scope** `Cluster`
   **Default Value** `60` minutes
   **Description** The interval in minutes after which a periodic background 
thread will schedule DRS for a cluster.
   
    - ClusterDrsIterations
   **Key** `drs.max.migrations`
   **Scope** `Cluster`
   **Default Value** `50`
   **Description** Maximum number of live migrations in a DRS execution.
   
    - ClusterDrsAlgorithm
   **Key** `drs.algorithm`
   **Scope** `Cluster`
   **Default Value** `condensed`
   **Description** DRS algorithm to execute on the cluster. This PR implements 
two algorithms - `balanced` & `condensed`.
   
    - ClusterDrsLevel
   **Key** `drs.imbalance`
   **Scope** `Cluster`
   **Default Value** `0.5`
   **Description** Percentage (as a value between 0.0 and 1.0) of imbalance 
allowed in the cluster. 1.0 means no imbalance 
   is allowed and 0.0 means imbalance is allowed.
   
    - ClusterDrsMetric
   **Key** `drs.imbalance.metric`
   **Scope** `Cluster`
   **Default Value** `memory`
   **Description** The cluster imbalance metric to use when checking the 
`drs.imbalance.threshold`. Possible values are `memory` and `cpu`.
   <!--- Describe your changes in DETAIL - And how has behaviour functionally 
changed. -->
   
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to 
reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be 
closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   <!--- 
*********************************************************************************
 -->
   <!--- NOTE: AUTOMATATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE 
DOCUMENTATION. -->
   <!--- PLEASE PUT AN 'X' in only **ONE** box -->
   <!--- 
*********************************************************************************
 -->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [x] New feature (non-breaking change which adds functionality)
   - [ ] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [x] Major
   - [ ] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [ ] Minor
   - [ ] Trivial
   
   
   ### Screenshots (if appropriate):
   
   
   ### How Has This Been Tested?
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to 
-->
   <!-- see how your change affects other areas of the code, etc. -->
   
   
   <!-- Please read the 
[CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) 
document -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Feature: Add support for DRS in a Cluster [cloudstack]

Reply via email to