It seems like it would depend on the class of error. 48 hours for
events not syncing, fine. 48 hours of /total data loss/ is a
completely different class of problem.

On 27 November 2015 at 11:35, Nuria Ruiz <[email protected]> wrote:
>>Unfortunately, the only team-members working full-time yesterday and today
>> are we Europe folks.
>>We weren't there when that happened and we don't get those alerts on the
>> phone, we should though.
> Given that this system is tier-2 i do not think we need an immediate
> response, 24 hours should be an acceptable ETA. I would say even 48.
>
> On Fri, Nov 27, 2015 at 2:31 AM, Marcel Ruiz Forns <[email protected]>
> wrote:
>>
>> Thanks, Ori, for having a look at this and restarting EL.
>>
>> I understand it was 01:30 UTC on Friday (today), not Thursday. It went on
>> during 5-6 hours.
>> Unfortunately, the only team-members working full-time yesterday and today
>> are we Europe folks.
>> We weren't there when that happened and we don't get those alerts on the
>> phone, we should though.
>>
>> This problem happened already like a month ago. We'll backfill the missing
>> events and will investigate.
>> Thanks again for the heads-up.
>>
>> On Fri, Nov 27, 2015 at 8:01 AM, Ori Livneh <[email protected]> wrote:
>>>
>>> On Thu, Nov 26, 2015 at 10:46 PM, Ori Livneh <[email protected]> wrote:
>>>>
>>>> Seems that eventlog1001 has not received any events since 01:30 UTC on
>>>> Thursday
>>>>
>>>>
>>>> http://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Miscellaneous+eqiad&h=eventlog1001.eqiad.wmnet&jr=&js=&event=hide&ts=0&v=140128.28&m=bytes_in&vl=bytes%2Fsec&ti=Bytes+Received
>>>>
>>>> This is pretty severe; I'd page if it wasn't a US holiday.
>>>
>>>
>>> Kafka clients on eventlog1001 were in a "Autocommitting consumer offset"
>>> death-loop and not receiving any events from the Kafka brokers. I ran
>>> eventloggingctl stop / eventloggingctl start and they recovered. Needs to be
>>> investigated more thoroughly. Otto, can you follow up?
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>>
>> --
>> Marcel Ruiz Forns
>> Analytics Developer
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to