soreana opened a new pull request, #8328:
URL: https://github.com/apache/cloudstack/pull/8328

   ### Description
   
   Sometimes the hostStats object of the agents becomes null in the management 
server. It is a rare situation, and we haven't found the root cause yet, but it 
occurs occasionally in our CloudStack deployments with many hosts.
   
   The hostStat is null, even though the agent is UP and hosting multiple VMs. 
It is possible to access the VM consoles and execute tasks on them.
   
   This pull request doesn't address the issue directly; rather it displays 
those hosts in Prometheus so we can restart the agent and get the necessary 
information.
   
   <!--- Describe your changes in DETAIL - And how has behaviour functionally 
changed. -->
   
   <!-- For new features, provide link to FS, dev ML discussion etc. -->
   <!-- In case of bug fix, the expected and actual behaviours, steps to 
reproduce. -->
   
   <!-- When "Fixes: #<id>" is specified, the issue/PR will automatically be 
closed when this PR gets merged -->
   <!-- For addressing multiple issues/PRs, use multiple "Fixes: #<id>" -->
   <!-- Fixes: # -->
   
   <!--- 
******************************************************************************* 
-->
   <!--- NOTE: AUTOMATION USES THE DESCRIPTIONS TO SET LABELS AND PRODUCE 
DOCUMENTATION. -->
   <!--- PLEASE PUT AN 'X' in only **ONE** box -->
   <!--- 
******************************************************************************* 
-->
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [x] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   - [ ] build/CI
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [x] Major
   - [ ] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [x] Major
   - [ ] Minor
   - [ ] Trivial
   
   
   ### How Has This Been Tested?
   
   <!-- Please describe in detail how you tested your changes. -->
   <!-- Include details of your testing environment, and the tests you ran to 
-->
   
   
   1. Set `prometheus.exporter.enable` to `true`.
   2. Execute `curl localhost:9595/metrics` on management server to make sure 
that prometheus is working.
   3. Stop any agent.
   4. Run `curl localhost:9595/metrics | grep cloudstack_host_missing_info` you 
get nothing in output cause the host state is still there. (If you wait for 
couple of minutes management server may remove it)
   5. Restart the management server to remove cashed host stats objects in the 
memory.
   6. Run `curl localhost:9595/metrics | grep cloudstack_host_missing_info` 
again to get the following output:
   ```
   curl localhost:9595/metrics | grep cloudstack_host_missing_info
   
cloudstack_host_missing_info{zone="testZone1",hostname="node01",filter="hostStats"}
 -1
   ```
   
   #### How did you try to break this feature and the system with this change?
   
   <!-- see how your change affects other areas of the code, etc. -->
   
   The change wouldn't affect other area of code as the prometheus module is 
somehow an independent part of the CloudStack.
   
   
   <!-- Please read the 
[CONTRIBUTING](https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md) 
document -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to