Ioana Lasc has proposed merging 
~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into 
canonical-is-prometheus:master.

Commit message:
Add alert for PPA publisher

Requested reviews:
  Launchpad code reviewers (launchpad-reviewers)

For more details, see:
https://code.launchpad.net/~ilasc/canonical-is-prometheus/+git/canonical-is-prometheus/+merge/433207

>From I can decipher in the logs this is a cron job set to run every minute 
>unless there is a lock on /var/lock/launchpad-publisher.lock (which means 
>there is an active run) - looking at line 7 in 
>https://bazaar.launchpad.net/~launchpad-pqm/lp-production-crontabs/trunk/view/head:/ppa.lp.internal-lp_publish

By inspecting the lp_publish/cron.ppa.log looks like the longest run took 17 
minutes, I would say if it hasn't run in 30 minutes means we probably need to 
alert and look at the log.

Looks like the metric is already there for any emited by 
`emit_script_activity_metric` on succesfull completion of any 
LaunchpadCronScript and tested the expression on prometheus and it looks right 
when compared to the outages over the last few days.
-- 
Your team Launchpad code reviewers is requested to review the proposed merge of 
~ilasc/canonical-is-prometheus:add-lp-ppa-publisher-alert into 
canonical-is-prometheus:master.
diff --git a/ols/launchpad.rules b/ols/launchpad.rules
index 2db3f47..9a404ba 100644
--- a/ols/launchpad.rules
+++ b/ols/launchpad.rules
@@ -23,6 +23,17 @@ groups:
         labels:
           severity: warning
 
+      - alert: LaunchpadPPAPublisherStuck
+        expr: absent_over_time(lp_script_activity_count{env='production', name='publish-distro'}[30m])
+        for: 5m
+        annotations:
+          summary: ppa publisher has not run for 30m
+          dashboard_url: https://grafana.admin.canonical.com/d/000000044/telegraf-host?orgId=1&from=now-2h&to=now&var-juju_controller=prodstack5-prodstack5-prodstack-is&var-juju_model=All&var-service=launchpad-ppa&var-juju_unit=launchpad-ppa%2F2
+          description: Launchpad Script {{ $labels.name }} is not running as expected.
+          playbook_url: https://wiki.canonical.com/InformationInfrastructure/IS/LaunchpadScripts#LaunchpadPPAPublisherStuck
+        labels:
+          severity: warning
+
       - alert: LaunchpadFlagExpiredMembershipsStuck
         expr: absent_over_time(lp_script_activity_count{env='production',host='loganberry',name='flag-expired-memberships'}[24h])
         for: 1h
_______________________________________________
Mailing list: https://launchpad.net/~launchpad-reviewers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~launchpad-reviewers
More help   : https://help.launchpad.net/ListHelp

Reply via email to