I have been working recently on a branch to Ganglia/monitor-core that
allows gmond to send metrics directly to an InfluxDB database, and I am
requesting comments and feedback before submitting a formal pull request

It can be found here:

  https://github.com/hawson/monitor-core/tree/influxdb


This purely a change to the gmond agent.  Other programs (e.g. gmetad,
gmetric) and components (the web UI) are not changed.  However, a logical
next phase could be to rework the WebUI and Gmetad to use InfluxDB as a
backend.  Since my original post to the ganglia-developers list a few days
ago, I've made a few minor improvements (per-metric "measurements"
attributes, and adding influxdb metrics to the gstatus module).


Changes are relatively isolated, a new lib/influxdb.c file was made for
most new functionality, and hooking into gmond as part of the existing
Ganglia_collection_group_send() function in the main gmond.c code. Thus,
when a packet would normally be sent to another gmond, it can also be sent
to an influxdb channel at the same time.  There are, of course, various
other changes sprinkled about to other files, mostly to add new gmond.conf
options.

The gmond.conf documentation, and default configuration file (from 'gmond
-t') has also been updated to cover the new configuration options.

The first new option is an influxdb_send_channel stanza.  It is fairly
simple, with three options.

  influxdb_send_channel {
    host     = myinfluxdb.example.com
    port     = 8089
    default_tags = zone=us-east,host_class=hpc  //optional tags sent with
each metric
  }

The "host" and "port" attributes are required, and their purpose should be
obvious.  The "default_tags" attribute is optional.  Influxdb permits tags
associated with each time/key/value tuple; this is how hostnames are
stored, for example.  This attribute allows default tags to be associated
with every metric sent, for example to identify an HPC cluster, or AWS
zone, or other useful bit of metadata.

The other change to gmond.conf is also optional, but strongly recommended.
Every collection_group stanza may now have an optional "measurement"
attribute.  An example for the some of the system load metrics:

  collection_group {
    collect_every = 20
    time_threshold = 90
    measurement = "load"  // <<<<<<<<<<<<<<<  new atttribute
    metric {
      name = "load_one"
      title = "One Minute Load Average"
    }
    metric {
      name = "load_five"
      title = "Five Minute Load Average"
    }
    [...]
  }

This attribute is used to assist InfluxDB in organizing metrics into groups
of "measurements".  Measurements are similar in function to an SQL table
(InfluxDB is not an SQL database, and the analogy is not perfect).  Since
most metrics in a collection group tend to be similar (all CPU stats are
collected at the same time, network stats at another, etc), adding this
metric at the collection_group level seems to make the most sense.  If a
collection_group does not have a measurement attribute, the metric name
(e.g. "load_one"), is used instead; this is not recommended.  The updated
documentaiton for gmond.conf has more details and several examples. Note
that adding this support did require some minor reorganization of the
default gmond.conf file.

I know of several improvements that cold be made, but believe that the code
is fit for general review.

Comments, questions and corrections are all welcome, either on the list, on
the github URL above, on IRC, or email.

-- 
Jesse Becker
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to