To contextualize this a bit more, this is one of the changes discussed in the Alerts Review proposal <https://docs.google.com/document/d/1PQKabMx9qoAKQS6qlHJDs2z2B_Bum_KqLYRaZ1pzXGc/edit>. We are still seeking feedback for the proposals, so if you haven't read/responded yet, this is a great time!
Thanks to Ben for your help moving this forward. Best, Brian King SRE, Data Platform/Search Platform Wikimedia Foundation IRC: inflatador > On Feb 9, 2024, at 9:52 AM, Ben Tullis <[email protected]> wrote: > > Hello, > > This is just a quick message to let you know that we made some changes today > to the monitoring configuration of many of the Data Platform Engineering > servers. This may affect you if you participate in Ops Week > <https://wikitech.wikimedia.org/wiki/Data_Engineering/Ops_week> for Data > Engineering and friends. > > By default, all notification alerts from Icinga and Prometheus will now go to > [email protected] > <https://groups.google.com/a/wikimedia.org/g/data-platform-alerts> instead of > [email protected] > <https://lists.wikimedia.org/hyperkitty/list/[email protected]/> > We are working to try to make sure that we can route any alert emails (and > IRC pings) to the most appropriate team, principally so that we don't > overload the person who is on Ops Week with a lot of messages that would be > more appropriately routed to Data Platform SREs. > > Any scheduled tasks related to data pipelines and services critical for data > processing are still going to be sent to the > [email protected] > <https://lists.wikimedia.org/hyperkitty/list/[email protected]/> > list, so that's Airflow jobs, Refine tasks, Gobblin, Sqoop, Varnishkafka, > Eventlogging etc. > > We haven't made any changes to the monitoring/notification settings of the > Search and Query Services servers (Elasticsearch/WDQS/WCQS etc) nor have we > made any changes to the Dumps servers. This mainly affects the analytics > systems <https://wikitech.wikimedia.org/wiki/Analytics/Systems> and the rest > of the Data Engineering team's infrastructure. > > Please do let us know if you have any queries or concerns about this change, > or if anything doesn't look right to you. > > You can reach out on Slack at #data-engineering-collab or #data-platform-sre > or on IRC at #wikimedia-analytics or #wikimedia-data-platform or to > [email protected] > <mailto:[email protected]> by email. > > Kind regards, > Ben > > -- > Ben Tullis (he/him) > Senior Site Reliability Engineer > Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________ Analytics mailing list -- [email protected] To unsubscribe send an email to [email protected]
