Hey, I'm a systems engineer that is a contributor to the Prometheus monitoring system. I also maintain the servers for my company.
I've been following various ntpd replacement projects and I'm pretty impressed with the progress of Chrony. One of the things I would need to do in order to replace our existing monitoring of ntpd. We currently parse the output of `ntpq -np` in order to generate metrics. Prometheus[0] uses a simple metric+labels combination format, similar things like OpenTSDB. Here's an example of what `ntpq -np` turns into: # TYPE node_ntpd_delay_milliseconds gauge node_ntpd_delay_milliseconds{remote="130.149.17.8"} 17.092 node_ntpd_delay_milliseconds{remote="193.190.230.65"} 4.937 node_ntpd_delay_milliseconds{remote="82.95.215.61"} 11.726 # TYPE node_ntpd_jitter_milliseconds gauge node_ntpd_jitter_milliseconds{remote="130.149.17.8"} 0.494 node_ntpd_jitter_milliseconds{remote="193.190.230.65"} 0.770 node_ntpd_jitter_milliseconds{remote="82.95.215.61"} 0.722 # TYPE node_ntpd_offset_milliseconds gauge node_ntpd_offset_milliseconds{remote="130.149.17.8"} 1.675 node_ntpd_offset_milliseconds{remote="193.190.230.65"} 0.135 node_ntpd_offset_milliseconds{remote="82.95.215.61"} -0.645 # TYPE node_ntpd_peer_status gauge node_ntpd_peer_status{remote="130.149.17.8",reference=".GPS.",stratum="1",type="unicast"} 3 node_ntpd_peer_status{remote="193.190.230.65",reference=".MRS.",stratum="1",type="unicast"} 4 node_ntpd_peer_status{remote="82.95.215.61",reference=".PPS.",stratum="1",type="unicast"} 6 This allows us to keep running timeseries metrics for peers, and write rules for things like "node_ntpd_peer_status < 4" to find unsynced servers. See here[1] for the code to status value map. The above metrics are generated by a bash script, which works but isn't my favorite way to deal with getting metrics from software. So far, I haven't been able to find a good programmatic way to extract stats with chronyc. There are a bunch of annoying parsing issues with things like the sourcestats command. The offset includes a precision, so I have to parse the precision and convert that to be all in one precision. I haven't seen much documentation on the protocol between chronyc and chronyd. A couple of specific questions. * Would chrony be interested in supporting the Prometheus metrics format? * Is there a mode for the various metrics outputs to be more machine readable? (json?) * Is there documentation for the chronyc protocol outside the code? * Are there any non-C chronyc client implementations? (python/ruby/whatever) [0]: http://prometheus.io/ [1]: https://www.eecis.udel.edu/~mills/ntp/html/decode.html#peer - Ben Kochie