On Tue, Oct 11, 2016 at 2:05 PM, Francesco Romani <[email protected]> wrote: > Hi all, > > In the last 2.5 days I was exploring if and how we can integrate collectd and > Vdsm.
Some comments regarding storage high watermarks only. I will comment later on other aspects. > The final picture could look like: > 1. collectd does all the monitoring and reporting currently Vdsm does > 2. Engine consumes data from collectd > 3. Vdsm consumes *notifications* from collectd - for few but important tasks > like Drive high water mark monitoring Drive high watermark is our core business, we cannot outsource it to collectd. Vdsm will always monitor high watermarks directly from libvirt. > Benefits (aka: why to bother?): > 1. less code in Vdsm / long-awaited modularization of Vdsm > 2. better integration with the system, reuse of well-known components > 3. more flexibility in monitoring/reporting: collectd is special purpose > existing solution > 4. faster, more scalable operation because all the monitoring can be done in C If the problem in monitoring is python, we can have small and simple helper doing the monitoring (for storage), like ioprocess. > At first glance, Collectd seems to have all the tools we need. > 1. A plugin interface > (https://collectd.org/wiki/index.php/Plugin_architecture and > https://collectd.org/wiki/index.php/Table_of_Plugins) > 2. Support for notifications and thresholds > (https://collectd.org/wiki/index.php/Notifications_and_thresholds) Setting threshhold and getting notifications when treshold is reached sounds like the best design for monitoring drive high watermarks. But I would like to depend on component that does *only* this task, and service only vdsm. > 3. a libvirt plugin https://collectd.org/wiki/index.php/Plugin:virt > > So, the picture is like > > 1. we start requiring collectd as dependency of Vdsm > 2. we either configure it appropriately (collectd support config drop-ins: > /etc/collectd.d) or we document our requirements (or both) > 3. collectd monitors the hosts and libvirt > 4. Engine polls collectd > 5. Vdsm listens from notifications Sounds good > > Should libvirt deliver us the event we need (see > https://bugzilla.redhat.com/show_bug.cgi?id=1181659), > we can just stop using collectd notifications, everything else works as > previously. > > Challenges: > 1. Collectd does NOT consider the plugin API stable > (https://collectd.org/wiki/index.php/Plugin_architecture#The_interface.27s_stability) > so the plugins should be inclueded in the main tree, much like the modules > of the linux kernel > Worth mentioning that the plugin API itself has a good deal of rough edges. > we will need to maintain this plugin ourselves, *and* we need to maintain > our thin API > layer, to make sure the plugin loads and works with recent versions of > collectd. > 2. the virt plugin is out of date, doesn't report some data we need: see > https://github.com/collectd/collectd/issues/1945 > 3. the notification message(s) are tailored for human consumption, those > messages are not easy > to parse for machines. > 4. the threshold support in collectd seems to match values against constants; > it doesn't seem possible > to match a value against another one, as we need to do for high water > monitoring (capacity VS allocation). > > How I'm addressing, or how I plan to address those challenges (aka action > items): > 1. I've been experimenting with out-of-tree plugins, and I managed develop, > build, install and run > one out-of-tree plugin: > https://github.com/mojaves/vmon/tree/master/collectd > The development pace of collectd looks sustainable, so this doesn't look > such a big deal. > Furthermore, we can engage with upstream to merge our plugins, either > as-is or to extend existing ones. > 2. Write another collectd plugin based on the Vdsm python code and/or my past > accelerator executable project > (https://github.com/mojaves/vmon) > 3. patch the collectd notification code. It is yet another plugin > OR > 4. send notification from the new virt module as per #2, bypassing the > threshold system. This move could preclude > the new virt module to be merged in the collectd tree. > > Current status of the action items: > 1. done BUT PoC quality > 2. To be done (more work than #1/possible dupe with github issue) > 3. need more investigation, conflicts with #4 > 4. need more investigation, conflicts with #3 > > All the code I'm working on will be found on https://github.com/mojaves/vmon > > Comments are appreciated > > -- > Francesco Romani > RedHat Engineering Virtualization R & D > Phone: 8261328 > IRC: fromani _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
