[I] Autoscale VMs do not follow ScaleDown Rules after a Node Failure [cloudstack]

via GitHub Thu, 04 Jul 2024 02:35:50 -0700


btzq opened a new issue, #9336:
URL: https://github.com/apache/cloudstack/issues/9336


   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and main branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   Autoscale
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on main 
branch.
   -->
   
   ~~~
   4.19.0
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, 
advanced networking, etc.  N/A otherwise
   -->
   
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal 
test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   We are actively using Autoscale Groups which have the following scale down 
rules. 
   
   Autoscale Rule
   - Name: ScaleDownPolicy-0
   - Duration (in sec) = 60 Seconds
   - Quiet Time (in sec) = 30 seconds
   
   Conditions:
   - Counter:  VM CPU - Average Percentage
   - Operator: Less Than
   - Threshold: 35
   
   These rules work well on normal days. But today, we had a node failure, 
which the node was hosting some autoscaleVMs
   
   As a result, we experienced this issue, which have already been reported:
   - https://github.com/apache/cloudstack/issues/9145
   
   In the case where the VMs were not 'Orphaned', and managed to find its way 
back to the Autoscale Group (or just werent affected), i noticed that the scale 
down rule did not work anymore. 
   
   I had VMs that:
   - Min : 2 Members
   - Max : 6 Members
   - Available Instances: 6 _<- This should be 2 Instead._
   
   And after 5 minutes, it still did not scale down. All VMs will running. 
   Ive verified the CPU Utilisation of all VMs were only 1% consistently. 
   
   In order to have the service resume, i had to:
   
   - Disable the Autoscale Group
   - Delete the VMs
   - Re-enable the Autoscale VM
   
   After that, the scaledown rule works, and i had the setup below again:
   
   I had VMs that:
   - Min : 2 Members
   - Max : 6 Members
   - Available Instances: 2
   
   <!-- Paste example playbooks or commands between quotes below -->
   ~~~
   NA
   ~~~
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   ~~~
   NA
   ~~~
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   <!-- Paste verbatim command output between quotes below -->
   ~~~
   In the case where the VMs were not 'Orphaned', and managed to find its way 
back to the Autoscale Group (or just happened to be in a host that was not 
affected), i noticed that the scale down rule did not work anymore.
   ~~~
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Autoscale VMs do not follow ScaleDown Rules after a Node Failure [cloudstack]

Reply via email to