How urgent is this? An easy fix right now would be to turn of the parallelized processors and run just one. We haven't yet increased traffic, so we can run eventlogging like we were before, with only one processor. If not urgent, we will implement a proper fix. If urgent, it is easy for us to fix in the short term.
On Tue, Sep 15, 2015 at 11:51 PM, Dan Garry <[email protected]> wrote: > Thank you, Dan, and everyone else who's been involved in fixing this. > > Dan > > On 15 September 2015 at 19:23, Dan Andreescu <[email protected]> > wrote: > >> I confirmed this on IRC, but just feeding the archives here. I'm also >> convinced that the client IP hashing bug we just found explains this >> problem. It's good we took a look at the other problems, but the main one >> seems the IP hashing. We'll brain bounce more tomorrow on how to fix that. >> >> On Tue, Sep 15, 2015 at 6:23 PM, Oliver Keyes <[email protected]> >> wrote: >> >>> Update; I read Dan's thread about hashing, read this thread, and a >>> penny dropped ;). >>> >>> This is totally explainable by the fact that we /expect/ to see >>> multiple pageIDs per IP. And we are! The hashing problem just means >>> those aren't /appearing/ to be the same IP. >>> >>> On 15 September 2015 at 18:05, Erik Bernhardson >>> <[email protected]> wrote: >>> > We've deployed the change to bucketing, but we are still seeing the >>> same >>> > issue in the collected data. >>> > >>> > Again we are generating a unique 64 bit random number when the user >>> gets to >>> > the page. We are seeing this same 64 bit unique number being reported >>> by >>> > multiple ip addresses. >>> > >>> > Since deploying the new schema number with the updated bucket >>> selection we >>> > have seen 13 distinct tokens coming from 42 distinct ip addresses. This >>> > shouldn't be possible. >>> > >>> > mysql:[email protected] [log]> select >>> count(distinct >>> > clientIp) from CompletionSugges >>> > tions_13630018; >>> > +--------------------------+ >>> > | count(distinct clientIp) | >>> > +--------------------------+ >>> > | 42 | >>> > +--------------------------+ >>> > 1 row in set (0.00 sec) >>> > >>> > mysql:[email protected] [log]> select >>> count(distinct >>> > event_pageViewToken) from CompletionSuggestions_13630018; >>> > >>> > +-------------------------------------+ >>> > | count(distinct event_pageViewToken) | >>> > +-------------------------------------+ >>> > | 13 | >>> > +-------------------------------------+ >>> > 1 row in set (0.00 sec) >>> > >>> > >>> > >>> > My best guess at this point is that something has changed in the way >>> these >>> > clientIp's are collected and is incorrect. >>> > >>> > >>> > On Mon, Sep 14, 2015 at 1:32 PM, Erik Bernhardson >>> > <[email protected]> wrote: >>> >> >>> >> Thanks for taking a look over this. I've incorperated your suggestions >>> >> into a patch[1] and if all looks good will send that out in SWAT. We >>> should >>> >> be able to look at the data collected overnight and see if things are >>> more >>> >> sane tomorrow. >>> >> >>> >> [1] https://gerrit.wikimedia.org/r/#/c/238306/ >>> >> >>> >> On Mon, Sep 14, 2015 at 11:56 AM, Gergo Tisza <[email protected]> >>> >> wrote: >>> >>> >>> >>> You are queueing a logging callback every time a request is sent >>> (which >>> >>> is roughly every time the user types another character in the search >>> box) >>> >>> until the tracking module finishes loading and >>> mw.searchSuggest.request is >>> >>> restored. On a slow connection the user might type several >>> characters and >>> >>> trigger several log events by then. If you filter for queries from >>> the same >>> >>> non-unique IP, you will probably see something like "a", "ab", >>> "abc"... >>> >>> >>> >>> _______________________________________________ >>> >>> Analytics mailing list >>> >>> [email protected] >>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >>> >> >>> > >>> > >>> > _______________________________________________ >>> > Analytics mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Count Logula >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > > -- > Dan Garry > Lead Product Manager, Discovery > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
