Regarding your last point, it seems like a mail is sent in this case, to
the following users: 

define contactgroup {
 contactgroup_name analytics
 members dvanliere,ezachte,dtaraborelli,otto,milimetric
}

and the role includes this group in the contacts: 

 nrpe::monitor_service { 'eventlogging':
 ensure => 'present',
 description => 'Check status of defined EventLogging jobs',
 nrpe_command => '/usr/lib/nagios/plugins/check_eventlogging_jobs',
 require => File['/usr/lib/nagios/plugins/check_eventlogging_jobs'],
 contact_group => 'admins,analytics',
 } 

If someone is missing from this list or the check needs to be added to
another service, i'll be glad to do it. 

Matanya 

On 2014-03-20 13:50, Dan Andreescu wrote: 

> Thank you for the detailed write-up Ori
> 
>> We have to fix this. The level of maintenance that EventLogging gets is not 
>> proportional to its usage across the organization. Analytics, I really need 
>> you to step up your involvement. 
>> 
>> It was not long ago that EventLogging was running reliably for months at a 
>> time. What has changed is not system load, but the owner seat becoming 
>> vacant, leading to a gradual deterioration of the quality of monitoring and 
>> auditing practices.
> 
> Indeed, the owner seat is vacant. According to a recent discussion on the 
> analytics list, we did not yet consider ourselves the proper owners of 
> EventLogging. Our sprint planning is today and I'll bring it up and note its 
> importance in light of this down time. 
> 
>> Sean proposed moving the EventLogging database to m2, so that it runs on 
>> separate hardware from the research databases. I think he's right. I filed 
>> <https://rt.wikimedia.org/Ticket/Display.html?id=7081 [1]> to request the 
>> migration.
> 
> Thank you, I support isolation. 
> 
>> Finally, I think EventLogging Icinga alerts should have a higher profile, 
>> and possibly page someone. Issues can usually be debugged using the 
>> eventloggingctl tool on Vanadium and by inspecting the log files on 
>> vanadium:/var/log/upstart/eventlogging-*.
> 
> I think this is the key reason the failure was ignored, so I agree here. We 
> should at the very least forward these alerts as an email to analytics devs. 
> I have no idea how to do that, if anyone would like to help that'd be great. 
> 
> _______________________________________________
> Ops mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/ops [2]
 

Links:
------
[1] https://rt.wikimedia.org/Ticket/Display.html?id=7081
[2] https://lists.wikimedia.org/mailman/listinfo/ops
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to