On Tue, Oct 11, 2016 at 2:05 PM, Francesco Romani <[email protected]> wrote:
> Hi all,
>
> In the last 2.5 days I was exploring if and how we can integrate collectd and 
> Vdsm.

Some comments regarding storage high watermarks only. I will comment later
on other aspects.

> The final picture could look like:
> 1. collectd does all the monitoring and reporting currently Vdsm does
> 2. Engine consumes data from collectd
> 3. Vdsm consumes *notifications* from collectd - for few but important tasks 
> like Drive high water mark monitoring

Drive high watermark is our core business,  we cannot outsource
it to collectd.

Vdsm will always monitor high watermarks directly from libvirt.

> Benefits (aka: why to bother?):
> 1. less code in Vdsm / long-awaited modularization of Vdsm
> 2. better integration with the system, reuse of well-known components
> 3. more flexibility in monitoring/reporting: collectd is special purpose 
> existing solution
> 4. faster, more scalable operation because all the monitoring can be done in C

If the problem in monitoring is python, we can have small and simple
helper doing the monitoring (for storage), like ioprocess.

> At first glance, Collectd seems to have all the tools we need.
> 1. A plugin interface 
> (https://collectd.org/wiki/index.php/Plugin_architecture and 
> https://collectd.org/wiki/index.php/Table_of_Plugins)
> 2. Support for notifications and thresholds 
> (https://collectd.org/wiki/index.php/Notifications_and_thresholds)

Setting threshhold and getting notifications when treshold is reached
sounds like the best design for monitoring drive high watermarks.

But I would like to depend on component that does *only* this task, and
service only vdsm.

> 3. a libvirt plugin https://collectd.org/wiki/index.php/Plugin:virt
>
> So, the picture is like
>
> 1. we start requiring collectd as dependency of Vdsm
> 2. we either configure it appropriately (collectd support config drop-ins: 
> /etc/collectd.d) or we document our requirements (or both)
> 3. collectd monitors the hosts and libvirt
> 4. Engine polls collectd
> 5. Vdsm listens from notifications

Sounds good

>
> Should libvirt deliver us the event we need (see 
> https://bugzilla.redhat.com/show_bug.cgi?id=1181659),
> we can just stop using collectd notifications, everything else works as 
> previously.
>
> Challenges:
> 1. Collectd does NOT consider the plugin API stable 
> (https://collectd.org/wiki/index.php/Plugin_architecture#The_interface.27s_stability)
>    so the plugins should be inclueded in the main tree, much like the modules 
> of the linux kernel
>    Worth mentioning that the plugin API itself has a good deal of rough edges.
>    we will need to maintain this plugin ourselves, *and* we need to maintain 
> our thin API
>    layer, to make sure the plugin loads and works with recent versions of 
> collectd.
> 2. the virt plugin is out of date, doesn't report some data we need: see 
> https://github.com/collectd/collectd/issues/1945
> 3. the notification message(s) are tailored for human consumption, those 
> messages are not easy
>    to parse for machines.
> 4. the threshold support in collectd seems to match values against constants; 
> it doesn't seem possible
>    to match a value against another one, as we need to do for high water 
> monitoring (capacity VS allocation).
>
> How I'm addressing, or how I plan to address those challenges (aka action 
> items):
> 1. I've been experimenting with out-of-tree plugins, and I managed develop, 
> build, install and run
>    one out-of-tree plugin: 
> https://github.com/mojaves/vmon/tree/master/collectd
>    The development pace of collectd looks sustainable, so this doesn't look 
> such a big deal.
>    Furthermore, we can engage with upstream to merge our plugins, either 
> as-is or to extend existing ones.
> 2. Write another collectd plugin based on the Vdsm python code and/or my past 
> accelerator executable project
>    (https://github.com/mojaves/vmon)
> 3. patch the collectd notification code. It is yet another plugin
>    OR
> 4. send notification from the new virt module as per #2, bypassing the 
> threshold system. This move could preclude
>    the new virt module to be merged in the collectd tree.
>
> Current status of the action items:
> 1. done BUT PoC quality
> 2. To be done (more work than #1/possible dupe with github issue)
> 3. need more investigation, conflicts with #4
> 4. need more investigation, conflicts with #3
>
> All the code I'm working on will be found on https://github.com/mojaves/vmon
>
> Comments are appreciated
>
> --
> Francesco Romani
> RedHat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
_______________________________________________
Devel mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/devel

Reply via email to