Hey,

I'm a systems engineer that is a contributor to the Prometheus monitoring
system.  I also maintain the servers for my company.

I've been following various ntpd replacement projects and I'm pretty
impressed with the progress of Chrony.

One of the things I would need to do in order to replace our existing
monitoring of ntpd.  We currently parse the output of `ntpq -np` in order
to generate metrics.

Prometheus[0] uses a simple metric+labels combination format, similar
things like OpenTSDB.

Here's an example of what `ntpq -np` turns into:

# TYPE node_ntpd_delay_milliseconds gauge
node_ntpd_delay_milliseconds{remote="130.149.17.8"} 17.092
node_ntpd_delay_milliseconds{remote="193.190.230.65"} 4.937
node_ntpd_delay_milliseconds{remote="82.95.215.61"} 11.726
# TYPE node_ntpd_jitter_milliseconds gauge
node_ntpd_jitter_milliseconds{remote="130.149.17.8"} 0.494
node_ntpd_jitter_milliseconds{remote="193.190.230.65"} 0.770
node_ntpd_jitter_milliseconds{remote="82.95.215.61"} 0.722
# TYPE node_ntpd_offset_milliseconds gauge
node_ntpd_offset_milliseconds{remote="130.149.17.8"} 1.675
node_ntpd_offset_milliseconds{remote="193.190.230.65"} 0.135
node_ntpd_offset_milliseconds{remote="82.95.215.61"} -0.645
# TYPE node_ntpd_peer_status gauge
node_ntpd_peer_status{remote="130.149.17.8",reference=".GPS.",stratum="1",type="unicast"}
3
node_ntpd_peer_status{remote="193.190.230.65",reference=".MRS.",stratum="1",type="unicast"}
4
node_ntpd_peer_status{remote="82.95.215.61",reference=".PPS.",stratum="1",type="unicast"}
6

This allows us to keep running timeseries metrics for peers, and write
rules for things like "node_ntpd_peer_status < 4" to find unsynced servers.
See here[1] for the code to status value map.

The above metrics are generated by a bash script, which works but isn't my
favorite way to deal with getting metrics from software.

So far, I haven't been able to find a good programmatic way to extract
stats with chronyc.  There are a bunch of annoying parsing issues with
things like the sourcestats command.  The offset includes a precision, so I
have to parse the precision and convert that to be all in one precision.  I
haven't seen much documentation on the protocol between chronyc and chronyd.

A couple of specific questions.
* Would chrony be interested in supporting the Prometheus metrics format?
* Is there a mode for the various metrics outputs to be more machine
readable? (json?)
* Is there documentation for the chronyc protocol outside the code?
* Are there any non-C chronyc client implementations? (python/ruby/whatever)

[0]: http://prometheus.io/
[1]: https://www.eecis.udel.edu/~mills/ntp/html/decode.html#peer

- Ben Kochie

Reply via email to