onitake opened a new issue #3270: System VM health monitoring URL: https://github.com/apache/cloudstack/issues/3270 <!-- Verify first that your issue/request is not already reported on GitHub. Also test if the latest release and master branch are affected too. Always add information AFTER of these HTML comments, but no need to delete the comments. --> ##### ISSUE TYPE <!-- Pick one below and delete the rest --> * Feature Idea ##### COMPONENT NAME <!-- Categorize the issue, e.g. API, VR, VPN, UI, etc. --> ~~~ VR, API ~~~ ##### CLOUDSTACK VERSION <!-- New line separated list of affected versions, commit ID for issues on master branch. --> ~~~ 4.11 ~~~ ##### CONFIGURATION <!-- Information about the configuration if relevant, e.g. basic network, advanced networking, etc. N/A otherwise --> N/A ##### OS / ENVIRONMENT <!-- Information about the environment if relevant, N/A otherwise --> N/A ##### SUMMARY <!-- Explain the problem/feature briefly --> In CloudStack 4.11, there seems to be no effective way to monitor the health of virtual routers and other system VMs. CloudStack does some internal health monitoring, but the result of these health checks is not exposed via the API. It would be very useful if system VMs (especially the VRs) or the API offered a built-in way to monitor their performance and health. ##### STEPS TO REPRODUCE <!-- For bugs, show exactly how to reproduce the problem, using a minimal test-case. Use Screenshots if accurate. For new features, show how the feature would be used. --> An existing monitoring system, such as Icinga or Prometheus would automatically query CloudStack about its networks and/or system VMs, then record performance data from each system VM. This would allow performance monitoring (such as CPU, memory and network load) and health checks to react to extraordinary conditions. Some of these measurements could be implemented on the hypervisor level, but others are inherent to the system VM and thus not easily monitorable in a hypervisor-agnostic way. ##### EXPECTED RESULTS <!-- What did you expect to happen when running the steps above? --> Aside from offering simple monitors such as those offered for regular VMs, where CloudStack queries the hypervisor and exposes them via the API, additional data from the system VM services should be available. Alternatively, the system VMs could expose a monitoring service (SNMP, Prometheus, NRPE, ...) that can be queried from a monitoring service running on the backend network. Here's a non-exhaustive list of monitors that would be interesting: - CPU load - System load - Memory / cache / swap usage - Disk read / write / bit/s / IOPS - Per-network in / out / bit/s / packet/s - Number of NAT table (conntrack) entries - System health / disk usage % / swap usage % / rootfs mounted readonly ##### ACTUAL RESULTS <!-- What actually happened? --> Neither simple hypervisor-based metrics nor advanced metrics from the service are exposed. In the past, we manually installed additional monitoring services into each newly deployed VR, which would then queried from our central monitoring platform. But such a setup can be brittle, lead to incompatibilities and performance problems and needs manual attention to stay up to date.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
