BBlack has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/368779 )
Change subject: OCSP: Warn less, retry more
......................................................................
OCSP: Warn less, retry more
This doubles the OCSP fetcher executions to twice per day, and
reduces the warning thresholds so they don't trigger until at
least 2 straight days of failure.
Bug: T172116
Change-Id: I076b956f72e9dfd54e306eb316a21047eb4f1527
---
M modules/nagios_common/files/check_commands/check_ssl
M modules/sslcert/manifests/ocsp/init.pp
M modules/tlsproxy/manifests/ocsp.pp
3 files changed, 11 insertions(+), 8 deletions(-)
Approvals:
Ema: Looks good to me, but someone else must approve
BBlack: Verified; Looks good to me, approved
diff --git a/modules/nagios_common/files/check_commands/check_ssl
b/modules/nagios_common/files/check_commands/check_ssl
index 7c05284..62a45d5 100755
--- a/modules/nagios_common/files/check_commands/check_ssl
+++ b/modules/nagios_common/files/check_commands/check_ssl
@@ -109,7 +109,7 @@
$ng->arg(
spec => 'ocspwarn=i',
help => 'Warning threshold for OCSP staple validity in seconds
(default: %s)',
- default => 86400*3,
+ default => 86400*2,
);
$ng->arg(
spec => 'ocspcrit=i',
diff --git a/modules/sslcert/manifests/ocsp/init.pp
b/modules/sslcert/manifests/ocsp/init.pp
index a9514a9..12e07f7 100644
--- a/modules/sslcert/manifests/ocsp/init.pp
+++ b/modules/sslcert/manifests/ocsp/init.pp
@@ -49,10 +49,12 @@
mode => '0755',
}
+ # Twice a day, 12h apart
+ $cron_h12 = fqdn_rand(12, 'e663dd38dd6d3384')
cron { 'update-ocsp-all':
command => '/usr/local/sbin/update-ocsp-all 2>&1 | logger -t
update-ocsp-all',
minute => fqdn_rand(60, '1adf3dd699e51805'),
- hour => fqdn_rand(24, 'e663dd38dd6d3384'),
+ hour => [ $cron_h12, $cron_h12 + 12 ],
require => [
File['/usr/local/sbin/update-ocsp-all'],
File['/etc/update-ocsp.d'],
diff --git a/modules/tlsproxy/manifests/ocsp.pp
b/modules/tlsproxy/manifests/ocsp.pp
index 3d039cc..25df3d6 100644
--- a/modules/tlsproxy/manifests/ocsp.pp
+++ b/modules/tlsproxy/manifests/ocsp.pp
@@ -19,13 +19,14 @@
# fetch of data has a 4-7 day lifetime depending on the vendor (GlobalSign
# or Digicert)
#
- # The crit/warn values of 259500 and 86700 correspond to "1d5m" and
- # "3d5m", so those are basically warning if 1 updates in a row failed
- # for a given cert, and critical if 3 updates in a row failed (at which
- # point we have ~24h left to fix the situation before the validity window
- # expires).
+ # The warn and crit values of 173100 and 259200 correspond to "2d5m" and
+ # "3d5m", and are checking the mtime of the files (not the internal expiry
+ # times). This should give us ~24h to fix, assuming we're getting minimum
+ # 4-day staples. The live ssl checker also checks for internal timestamps
+ # nearing expiry as well (warn at 2 days left, crit at 1 day left), so
+ # we're covered on two fronts here.
- $check_args = '-c 259500 -w 86700 -d /var/cache/ocsp -g "*.ocsp"'
+ $check_args = '-c 259500 -w 173100 -d /var/cache/ocsp -g "*.ocsp"'
nrpe::monitor_service { 'ocsp-freshness':
description => 'Freshness of OCSP Stapling files',
nrpe_command => "/usr/lib/nagios/plugins/check-fresh-files-in-dir.py
${check_args}",
--
To view, visit https://gerrit.wikimedia.org/r/368779
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I076b956f72e9dfd54e306eb316a21047eb4f1527
Gerrit-PatchSet: 3
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: BBlack <[email protected]>
Gerrit-Reviewer: BBlack <[email protected]>
Gerrit-Reviewer: Ema <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits