Update on https://phabricator.wikimedia.org/T146840
I will be closing this ticket shortly as we actually already looked at that problem. It was related to a mediawiki performance regression ultimately caused by a bug on Chrome. On Fri, Dec 16, 2016 at 11:35 AM, Nuria Ruiz <[email protected]> wrote: > I was about to send similar e-mail, unless you tag us with Analytics we > will not see the issues. Thus, if issues are important enough and we have > not responded (for the most part we respond quite fast to operational > issues) DO ping us on irc. That is what the channel is for. > > It is very unlikely that 3 months after the fact we would know what happen > on EL on September 9th, we do not retain neither operational logs nor data > logs that long so probably that ticket would be closed w/o resolution > cause we did not learn about it promptly enough. > > Thanks, > > Nuria > > > > > > > > > > > On Fri, Dec 16, 2016 at 11:16 AM, Dan Andreescu <[email protected]> > wrote: > >> Thanks for the email, Tilman. I will read it in depth and look closely >> at the issues, but I want to point out something majorly important: >> >> *** We are NOT certain to see tasks unless they're tagged with >> "ANALYTICS". We have an outstanding ask from the phab team and upstream to >> solve issues that will help us get around this limitation. But for the >> meantime, if you want us to see a task you MUST tag it with Analytics. *** >> >> As a result, I personally didn't see these tasks until your email just >> now. I hope my instant response and reaction will help prove that I take >> them seriously. I have tagged those tasks with Analytics and also put them >> in our working board to give them immediate priority. >> >> p.s. the "ERROR 2013 (HY000): Lost connection to MySQL server during >> query" errors are, as far as I understand, just time-outs that help the DBA >> teams manage performance on the analytics servers. I have never seen them >> affect results, and wikimetrics has a way of actively waking up connections >> that die in this way. >> >> On Fri, Dec 16, 2016 at 1:59 PM, Tilman Bayer <[email protected]> >> wrote: >> >>> (This is a little note I have meant to write for a while. Sending it >>> both a heads-up for other people who work with this data - many may have >>> encountered some of these issues, but not everybody may be aware of all of >>> them - and a contribution to the discussion about the Analytics team's >>> "operational excellence" quarterly goal >>> <https://www.mediawiki.org/wiki/Wikimedia_Engineering/2016-17_Q3_Goals#Analytics> >>> for >>> Q3.) >>> >>> So, EventLogging has been a highly useful part of our analytics >>> infrastructure for years now, critical for the work of many teams. However, >>> over the course of this year there have been several longstanding issues >>> that make me wonder if we are giving it enough attention >>> infrastructure-wise. >>> >>> 1. https://phabricator.wikimedia.org/T146840 Major loss of events in >>> many different schemas, apparently differing by browser family. This >>> affected e.g. one of the main metrics we've been using to evaluate >>> hovercards (page previews) in the reading Web team and was the reason we >>> had to restrict the analysis of recent A/B tests there to Firefox only. It >>> also created confusion for users of the Discovery department's mobile >>> search dashboard and affected the Edit schema as well. No reaction on the >>> task from Analytics since September 28. >>> >>> 2. https://phabricator.wikimedia.org/T142667 Duplicate (spurious) >>> EventLogging rows, a longterm issue first observed, independently, by >>> people from the Reading web team and myself around April/May. The effect on >>> query results is small in most cases, but significant in some, and in any >>> case does not raise confidence in the quality of the data - we would at >>> least like to know what the most likely explanations are. No reaction from >>> Analytics since August, despite four "The World Burns" tokens by other data >>> analysts and a reminder from Reading management. >>> >>> 3. "ERROR 2013 (HY000): Lost connection to MySQL server during query" >>> and "ERROR 2006 (HY000): MySQL server has gone away" when trying to query >>> EL data from stat1003. Happening infrequently but often enough to be a >>> major nuisance at times. (I haven't filed a Phabricator task for this yet, >>> but brought it up on IRC various times. Arguably a more database/service >>> quality issue, but I'm not certain it can't affect query results as well.) >>> >>> There are various other EL issues I have been encountering more >>> sporadically (and in some cases still need to file Phabricator tasks for), >>> but these are some of the most important. >>> >>> I am wondering whether this list may be a better venue for raising >>> awareness when things get stale on Phabricator. >>> -- >>> Tilman Bayer >>> Senior Analyst >>> Wikimedia Foundation >>> IRC (Freenode): HaeB >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
