Hi all,

In the last 2.5 days I was exploring if and how we can integrate collectd and 

The final picture could look like:
1. collectd does all the monitoring and reporting currently Vdsm does
2. Engine consumes data from collectd
3. Vdsm consumes *notifications* from collectd - for few but important tasks 
like Drive high water mark monitoring

Benefits (aka: why to bother?):
1. less code in Vdsm / long-awaited modularization of Vdsm
2. better integration with the system, reuse of well-known components
3. more flexibility in monitoring/reporting: collectd is special purpose 
existing solution
4. faster, more scalable operation because all the monitoring can be done in C

At first glance, Collectd seems to have all the tools we need.
1. A plugin interface (https://collectd.org/wiki/index.php/Plugin_architecture 
and https://collectd.org/wiki/index.php/Table_of_Plugins)
2. Support for notifications and thresholds 
3. a libvirt plugin https://collectd.org/wiki/index.php/Plugin:virt

So, the picture is like

1. we start requiring collectd as dependency of Vdsm
2. we either configure it appropriately (collectd support config drop-ins: 
/etc/collectd.d) or we document our requirements (or both)
3. collectd monitors the hosts and libvirt
4. Engine polls collectd
5. Vdsm listens from notifications

Should libvirt deliver us the event we need (see 
we can just stop using collectd notifications, everything else works as 

1. Collectd does NOT consider the plugin API stable 
   so the plugins should be inclueded in the main tree, much like the modules 
of the linux kernel
   Worth mentioning that the plugin API itself has a good deal of rough edges.
   we will need to maintain this plugin ourselves, *and* we need to maintain 
our thin API
   layer, to make sure the plugin loads and works with recent versions of 
2. the virt plugin is out of date, doesn't report some data we need: see 
3. the notification message(s) are tailored for human consumption, those 
messages are not easy
   to parse for machines.
4. the threshold support in collectd seems to match values against constants; 
it doesn't seem possible
   to match a value against another one, as we need to do for high water 
monitoring (capacity VS allocation).

How I'm addressing, or how I plan to address those challenges (aka action 
1. I've been experimenting with out-of-tree plugins, and I managed develop, 
build, install and run
   one out-of-tree plugin: https://github.com/mojaves/vmon/tree/master/collectd
   The development pace of collectd looks sustainable, so this doesn't look 
such a big deal.
   Furthermore, we can engage with upstream to merge our plugins, either as-is 
or to extend existing ones.
2. Write another collectd plugin based on the Vdsm python code and/or my past 
accelerator executable project
3. patch the collectd notification code. It is yet another plugin
4. send notification from the new virt module as per #2, bypassing the 
threshold system. This move could preclude
   the new virt module to be merged in the collectd tree.

Current status of the action items:
1. done BUT PoC quality
2. To be done (more work than #1/possible dupe with github issue)
3. need more investigation, conflicts with #4
4. need more investigation, conflicts with #3

All the code I'm working on will be found on https://github.com/mojaves/vmon

Comments are appreciated

Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
Devel mailing list

Reply via email to