Update on https://phabricator.wikimedia.org/T146840

I will be closing this ticket shortly as we actually already looked at that
problem. It was related to  a mediawiki  performance regression ultimately
caused by a bug on Chrome.




On Fri, Dec 16, 2016 at 11:35 AM, Nuria Ruiz <[email protected]> wrote:

> I was about to send similar e-mail, unless you tag us with Analytics we
> will not see the issues. Thus, if issues are important enough and we have
> not responded (for the most part we respond quite fast to operational
> issues) DO ping us on irc. That is what the channel is for.
>
> It is very unlikely that 3 months after the fact we would know what happen
> on EL on September 9th, we do not retain neither operational logs nor data
> logs that long so probably  that ticket would be closed w/o resolution
> cause we did not learn about it promptly enough.
>
> Thanks,
>
> Nuria
>
>
>
>
>
>
>
>
>
>
> On Fri, Dec 16, 2016 at 11:16 AM, Dan Andreescu <[email protected]>
> wrote:
>
>> Thanks for the email, Tilman.  I will read it in depth and look closely
>> at the issues, but I want to point out something majorly important:
>>
>> *** We are NOT certain to see tasks unless they're tagged with
>> "ANALYTICS".  We have an outstanding ask from the phab team and upstream to
>> solve issues that will help us get around this limitation.  But for the
>> meantime, if you want us to see a task you MUST tag it with Analytics. ***
>>
>> As a result, I personally didn't see these tasks until your email just
>> now.  I hope my instant response and reaction will help prove that I take
>> them seriously.  I have tagged those tasks with Analytics and also put them
>> in our working board to give them immediate priority.
>>
>> p.s. the "ERROR 2013 (HY000): Lost connection to MySQL server during
>> query" errors are, as far as I understand, just time-outs that help the DBA
>> teams manage performance on the analytics servers.  I have never seen them
>> affect results, and wikimetrics has a way of actively waking up connections
>> that die in this way.
>>
>> On Fri, Dec 16, 2016 at 1:59 PM, Tilman Bayer <[email protected]>
>> wrote:
>>
>>> (This is a little note I have meant to write for a while. Sending it
>>> both a heads-up for other people who work with this data - many may have
>>> encountered some of these issues, but not everybody may be aware of all of
>>> them - and a contribution to the discussion about the Analytics team's
>>> "operational excellence" quarterly goal
>>> <https://www.mediawiki.org/wiki/Wikimedia_Engineering/2016-17_Q3_Goals#Analytics>
>>>  for
>>> Q3.)
>>>
>>> So, EventLogging has been a highly useful part of our analytics
>>> infrastructure for years now, critical for the work of many teams. However,
>>> over the course of this year there have been several longstanding issues
>>> that make me wonder if we are giving it enough attention
>>> infrastructure-wise.
>>>
>>> 1. https://phabricator.wikimedia.org/T146840 Major loss of events in
>>> many different schemas, apparently differing by browser family. This
>>> affected e.g. one of the main metrics we've been using to evaluate
>>> hovercards (page previews) in the reading Web team and was the reason we
>>> had to restrict the analysis of recent A/B tests there to Firefox only. It
>>> also created confusion for users of the Discovery department's mobile
>>> search dashboard and affected the Edit schema as well. No reaction on the
>>> task from Analytics since September 28.
>>>
>>> 2. https://phabricator.wikimedia.org/T142667 Duplicate (spurious)
>>> EventLogging rows, a longterm issue first observed, independently, by
>>> people from the Reading web team and myself around April/May. The effect on
>>> query results is small in most cases, but significant in some, and in any
>>> case does not raise confidence in the quality of the data - we would at
>>> least like to know what the most likely explanations are. No reaction from
>>> Analytics since August, despite four "The World Burns" tokens by other data
>>> analysts and a reminder from Reading management.
>>>
>>> 3. "ERROR 2013 (HY000): Lost connection to MySQL server during query"
>>> and "ERROR 2006 (HY000): MySQL server has gone away" when trying to query
>>> EL data from stat1003. Happening infrequently but often enough to be a
>>> major nuisance at times. (I haven't filed a Phabricator task for this yet,
>>> but brought it up on IRC various times. Arguably a more database/service
>>> quality issue, but I'm not certain it can't affect query results as well.)
>>>
>>> There are various other EL issues I have been encountering more
>>> sporadically (and in some cases still need to file Phabricator tasks for),
>>> but these are some of the most important.
>>>
>>> I am wondering whether this list may be a better venue for raising
>>> awareness when things get stale on Phabricator.
>>> --
>>> Tilman Bayer
>>> Senior Analyst
>>> Wikimedia Foundation
>>> IRC (Freenode): HaeB
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to