ahoy, I have a cluster of 3 nodes each with an HTTP /_up endpoint that returns 200 OK when things are well, and hangs when they are not (as the node is offline). I'm expecting to receive a persistent FAILURE notification each time round the main event loop if one of the nodes is down.
BTW full log and minimal config is more readable here: https://gist.github.com/dch/f9d53d63c2417742d647d064970c067d The metric collection works as expected, but if one node is down, I only see 1 FAILURE notification, and not a persistent one each time collectd does its loop: option = Hostname; value = i09; Created new plugin context. plugin_load: plugin "uptime" successfully loaded. plugin_load: plugin "curl" successfully loaded. plugin_load: plugin "threshold" successfully loaded. [2019-01-20 12:06:51] plugin_load: plugin "logfile" successfully loaded. [2019-01-20 12:06:51] type = logfile, key = LogLevel, value = info [2019-01-20 12:06:51] [info] plugin_load: plugin "target_notification" successfully loaded. [2019-01-20 12:06:51] [info] Initialization complete, entering read-loop. [2019-01-20 12:07:11] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c01, type = response_code, message = Host i09, plugin curl (instance couchdb_c01) type response_code: All data sources are within range again. Current value of "value" is 200.000000. [2019-01-20 12:07:12] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c02, type = response_code, message = Host i09, plugin curl (instance couchdb_c02) type response_code: All data sources are within range again. Current value of "value" is 200.000000. [2019-01-20 12:07:12] [error] curl plugin: curl_easy_perform failed with status 28: Connection timed out after 515 milliseconds [2019-01-20 12:07:21] [info] Notification: severity = FAILURE, host = i09, plugin = curl, plugin_instance = couchdb_c03, type = response_code, message = i09/curl-couchdb_c03/response_code has not been updated for 29.474 seconds. ^^^ good this is what expected to see - curl fails and a notification is triggered [2019-01-20 12:07:21] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c01, type = response_code, message = Host i09, plugin curl (instance couchdb_c01) type response_code: All data sources are within range again. Current value of "value" is 200.000000. [2019-01-20 12:07:22] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c02, type = response_code, message = Host i09, plugin curl (instance couchdb_c02) type response_code: All data sources are within range again. Current value of "value" is 200.000000. [2019-01-20 12:07:32] [error] curl plugin: curl_easy_perform failed with status 28: Connection timed out after 528 milliseconds ^^^ woops where is the next notification? [2019-01-20 12:07:41] [info] Notification: severity = OKAY, host = i09, plugin = curl, plugin_instance = couchdb_c01, type = response_code, message = Host i09, plugin curl (instance couchdb_c01) type response_code: All data sources are within range again. Current value of "value" is 200.000000. ... # input https://gist.github.com/dch/f9d53d63c2417742d647d064970c067d#file-collectd-conf-L23-L39 <Plugin curl> <Page "couchdb_c01"> URL "http://c01.skunkwerks.at:5984/_up" Timeout 500 MeasureResponseCode true </Page> <Page "couchdb_c02"> ... # notification https://gist.github.com/dch/f9d53d63c2417742d647d064970c067d#file-collectd-conf-L23-L39 LoadPlugin target_notification LoadPlugin threshold <Plugin "threshold"> <Plugin "curl"> Instance "couchdb_c01" <Type "response_code"> FailureMin 200 FailureMax 299 Persist true PersistOK true </Type> Instance "couchdb_c02" ... Is this is a bug or do I need to arrange my collectd.conf differently? FreeBSD 12.0-RELEASE-p2 amd64 collectd 5.8.1.git (FreeBSD packages) A+ Dave _______________________________________________ collectd mailing list [email protected] https://mailman.verplant.org/listinfo/collectd
