onitake opened a new issue #3270: System VM health monitoring
URL: https://github.com/apache/cloudstack/issues/3270
 
 
   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and master branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Feature Idea
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   VR, API
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on master 
branch.
   -->
   
   ~~~
   4.11
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, 
advanced networking, etc.  N/A otherwise
   -->
   N/A
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   N/A
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   In CloudStack 4.11, there seems to be no effective way to monitor the health 
of virtual routers and other system VMs.
   CloudStack does some internal health monitoring, but the result of these 
health checks is not exposed via the API.
   It would be very useful if system VMs (especially the VRs) or the API 
offered a built-in way to monitor their performance and health.
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal 
test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   An existing monitoring system, such as Icinga or Prometheus would 
automatically query CloudStack about its networks and/or system VMs, then 
record performance data from each system VM.
   
   This would allow performance monitoring (such as CPU, memory and network 
load) and health checks to react to extraordinary conditions.
   
   Some of these measurements could be implemented on the hypervisor level, but 
others are inherent to the system VM and thus not easily monitorable in a 
hypervisor-agnostic way.
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   Aside from offering simple monitors such as those offered for regular VMs, 
where CloudStack queries the hypervisor and exposes them via the API, 
additional data from the system VM services should be available.
   Alternatively, the system VMs could expose a monitoring service (SNMP, 
Prometheus, NRPE, ...) that can be queried from a monitoring service running on 
the backend network.
   
   Here's a non-exhaustive list of monitors that would be interesting:
   - CPU load
   - System load
   - Memory / cache / swap usage
   - Disk read / write / bit/s / IOPS
   - Per-network in / out / bit/s / packet/s
   - Number of NAT table (conntrack) entries
   - System health / disk usage % / swap usage % / rootfs mounted readonly
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   Neither simple hypervisor-based metrics nor advanced metrics from the 
service are exposed.
   
   In the past, we manually installed additional monitoring services into each 
newly deployed VR, which would then queried from our central monitoring 
platform. But such a setup can be brittle, lead to incompatibilities and 
performance problems and needs manual attention to stay up to date.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to