On Thu, 27 Feb 2020 at 00:47, Rick Elrod <[email protected]> wrote:

> I'd like to apply the following which does:
> - Adds a script I wrote for reading a timestamp from a file on disk
> and alerting if the timestamp within it is NOT within a particular
> delta to now.
> - Applies this to sundries01 and uses it to check
> /srv/websites/getfedora.org/build.timestamp.txt which now gets
> generated as part of the websites build.
>
> The purpose is because sometimes someone will commit something to the
> websites repo which breaks the build, but because of how we have
> things set up in openshift (cronjob), we don't get any kind of alert
> when that happens.
>
> Right now this sets the delta to 3 hours. In theory it should be 1,
> but I figure let it try to build a few times before we start alerting.
>
> Rick
>
>
>
Patch has been reviewed and looks correct for nagios and nrpe.



> commit 657d050f6d699bc43973d968cd93d12131fca7f2
> Author: Rick Elrod <[email protected]>
> Date:   Thu Feb 27 05:29:24 2020 +0000
>
>     nagios: Add script and check for checking that a timestamp within
> a file is within a delta of now, and then use this for alerting when
> websites stop building
>
>     Signed-off-by: Rick Elrod <[email protected]>
>
> diff --git a/roles/nagios_client/files/scripts/check_timestamp_from_file
> b/roles/nagios_client/files/scripts/check_timestamp_from_file
> new file mode 100644
> index 0000000..9064337
> --- /dev/null
> +++ b/roles/nagios_client/files/scripts/check_timestamp_from_file
> @@ -0,0 +1,43 @@
> +#!/usr/bin/env python
> +
> +# Takes a path to a file and a delta. The file must simply contain an
> epoch
> +# timestamp. It can be an integer or a float, as can the delta.
> +#
> +# Alerts critical if (now - timestamp contained in file) > delta.
> +#
> +# Rick Elrod <[email protected]>
> +# MIT
> +
> +import sys
> +import time
> +
> +if len(sys.argv) != 3:
> +    print('UNKNOWN: Pass path to file and delta as parameters')
> +    sys.exit(3)
> +
> +filename = sys.argv[1]
> +delta = float(sys.argv[2])
> +
> +timestamp = None
> +
> +try:
> +    with open(filename, 'r') as f:
> +        timestamp = float(f.read().strip())
> +except Exception as e:
> +    print('UNKNOWN: Unable to open/read file path')
> +    sys.exit(3)
> +
> +difference = round(time.time() - timestamp, 2)
> +if difference > delta:
> +    print(
> +        'CRITICAL: Timestamp in file (%.2f) exceeds delta (%.2f) by
> %.2f seconds' % (
> +            timestamp,
> +            delta,
> +            difference - delta))
> +    sys.exit(2)
> +
> +print('OK: Timestamp in file (%.2f) is within delta (%.2f) of now, by
> %.2f seconds' % (
> +    timestamp,
> +    delta,
> +    abs(difference - delta)))
> +sys.exit(0)
> diff --git a/roles/nagios_client/tasks/main.yml
> b/roles/nagios_client/tasks/main.yml
> index 2e5e0df..8e71a3b 100644
> --- a/roles/nagios_client/tasks/main.yml
> +++ b/roles/nagios_client/tasks/main.yml
> @@ -47,6 +47,7 @@
>    - check_osbs_api.py
>    - check_ipa_replication
>    - check_redis_queue.sh
> +  - check_timestamp_from_file
>    when: not inventory_hostname.startswith('noc')
>    tags:
>    - nagios_client
> @@ -226,6 +227,16 @@
>    tags:
>    - nagios_client
>
> +- name: install nrpe checks for sundries/websites
> +  template: src={{ item }}.j2 dest=/etc/nrpe.d/{{ item }} owner=root
> group=root mode=0644
> +  with_items:
> +  - check_websites_buildtime.cfg
> +  when: inventory_hostname.startswith('sundries')
> +  notify:
> +  - restart nrpe
> +  tags:
> +  - nagios_client
> +
>  - name: install nrpe config for the RabbitMQ checks
>    template:
>      src: "rabbitmq_args.ini.j2"
> diff --git a/roles/nagios_client/templates/check_websites_buildtime.cfg.j2
> b/roles/nagios_client/templates/check_websites_buildtime.cfg.j2
> new file mode 100644
> index 0000000..ff5639d
> --- /dev/null
> +++ b/roles/nagios_client/templates/check_websites_buildtime.cfg.j2
> @@ -0,0 +1,2 @@
> +# Alert if websites haven't been built in 3 hours
> +command[check_websites_buildtime]={{ libdir
> }}/nagios/plugins/check_timestamp_from_file
> /srv/websites/getfedora.org/build.timestamp.txt 10800
> diff --git a/roles/nagios_server/templates/nagios/services/websites.cfg.j2
> b/roles/nagios_server/templates/nagios/services/websites.cfg.j2
> index 85e8f8e..c8958d7 100644
> --- a/roles/nagios_server/templates/nagios/services/websites.cfg.j2
> +++ b/roles/nagios_server/templates/nagios/services/websites.cfg.j2
> @@ -316,4 +316,14 @@ define service {
>    use                   ppc-secondarytemplate
>  }
>
> +## Auxillary to websites but necessary to make them happen
> +
> +define service {
> +  host_name             sundries01.phx2.fedoraproject.org
> +  service_description   websites build happened recently
> +  check_command         check_by_nrpe!check_websites_buildtime
> +  use                   websitetemplate
> +}
> +
> +
>  {% endif %}
> _______________________________________________
> infrastructure mailing list -- [email protected]
> To unsubscribe send an email to
> [email protected]
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/[email protected]
>


-- 
Stephen J Smoogen.
_______________________________________________
infrastructure mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Reply via email to