It seems like it would depend on the class of error. 48 hours for events not syncing, fine. 48 hours of /total data loss/ is a completely different class of problem.
On 27 November 2015 at 11:35, Nuria Ruiz <[email protected]> wrote: >>Unfortunately, the only team-members working full-time yesterday and today >> are we Europe folks. >>We weren't there when that happened and we don't get those alerts on the >> phone, we should though. > Given that this system is tier-2 i do not think we need an immediate > response, 24 hours should be an acceptable ETA. I would say even 48. > > On Fri, Nov 27, 2015 at 2:31 AM, Marcel Ruiz Forns <[email protected]> > wrote: >> >> Thanks, Ori, for having a look at this and restarting EL. >> >> I understand it was 01:30 UTC on Friday (today), not Thursday. It went on >> during 5-6 hours. >> Unfortunately, the only team-members working full-time yesterday and today >> are we Europe folks. >> We weren't there when that happened and we don't get those alerts on the >> phone, we should though. >> >> This problem happened already like a month ago. We'll backfill the missing >> events and will investigate. >> Thanks again for the heads-up. >> >> On Fri, Nov 27, 2015 at 8:01 AM, Ori Livneh <[email protected]> wrote: >>> >>> On Thu, Nov 26, 2015 at 10:46 PM, Ori Livneh <[email protected]> wrote: >>>> >>>> Seems that eventlog1001 has not received any events since 01:30 UTC on >>>> Thursday >>>> >>>> >>>> http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Miscellaneous+eqiad&h=eventlog1001.eqiad.wmnet&jr=&js=&event=hide&ts=0&v=140128.28&m=bytes_in&vl=bytes%2Fsec&ti=Bytes+Received >>>> >>>> This is pretty severe; I'd page if it wasn't a US holiday. >>> >>> >>> Kafka clients on eventlog1001 were in a "Autocommitting consumer offset" >>> death-loop and not receiving any events from the Kafka brokers. I ran >>> eventloggingctl stop / eventloggingctl start and they recovered. Needs to be >>> investigated more thoroughly. Otto, can you follow up? >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> Marcel Ruiz Forns >> Analytics Developer >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
