It'd still be good to have that exposed as a metric, since:

 * that way you don't have to wait to make the mistake (or to find the
   logs from someone else's mistake) in order to wrap alerting around it
 * the metric's more or less the metric forever-ish, while it seems
   more likely that a well-intentioned phrasing change in one of the
   logs could screw up whatever pattern's being used to match it
 * I personally think that the metric is somehow more in my face than
   the logs (e.g., "oh look, I dumped the metrics with a curl/wget and
   that looks very much like a counter we need to wrap something
   around" 😁)
 * for those living in the Prometheus/Grafana/Loki ecosystem, it may be
   a bit easier to just run a copy of the BIND exporter
   (https://github.com/prometheus-community/bind_exporter) than to make
   sure that all the logs are getting scraped appropriately and the
   path to get them into Loki works and keeps working all the time --
   it being easier to generate a no-data alert for a metric than it is
   to say "this log message we never get, we still haven't gotten it"

And yes, I recognize that "well, Steve, the code's right over here, go to it" is a valid argument.

    -Steve

On 11/3/2023 6:09 AM, Vladimír Čunát via dns-operations wrote:

My understanding is that in this case the signer was producing loud syslog warnings immediately when the issue happened (i.e. long before validation could fail).

_______________________________________________
dns-operations mailing list
dns-operations@lists.dns-oarc.net
https://lists.dns-oarc.net/mailman/listinfo/dns-operations

Reply via email to