Andrew just deployed the change to go back to a single eventlogging processor. So as of right now-ish, IPs should be hashed consistently. Going forward, we'll only add parallel processors when we can ensure consistent hashing across them.
On Wed, Sep 16, 2015 at 10:53 AM, Andrew Otto <[email protected]> wrote: > How urgent is this? An easy fix right now would be to turn of the > parallelized processors and run just one. We haven't yet increased > traffic, so we can run eventlogging like we were before, with only one > processor. If not urgent, we will implement a proper fix. If urgent, it > is easy for us to fix in the short term. > > On Tue, Sep 15, 2015 at 11:51 PM, Dan Garry <[email protected]> wrote: > >> Thank you, Dan, and everyone else who's been involved in fixing this. >> >> Dan >> >> On 15 September 2015 at 19:23, Dan Andreescu <[email protected]> >> wrote: >> >>> I confirmed this on IRC, but just feeding the archives here. I'm also >>> convinced that the client IP hashing bug we just found explains this >>> problem. It's good we took a look at the other problems, but the main one >>> seems the IP hashing. We'll brain bounce more tomorrow on how to fix that. >>> >>> On Tue, Sep 15, 2015 at 6:23 PM, Oliver Keyes <[email protected]> >>> wrote: >>> >>>> Update; I read Dan's thread about hashing, read this thread, and a >>>> penny dropped ;). >>>> >>>> This is totally explainable by the fact that we /expect/ to see >>>> multiple pageIDs per IP. And we are! The hashing problem just means >>>> those aren't /appearing/ to be the same IP. >>>> >>>> On 15 September 2015 at 18:05, Erik Bernhardson >>>> <[email protected]> wrote: >>>> > We've deployed the change to bucketing, but we are still seeing the >>>> same >>>> > issue in the collected data. >>>> > >>>> > Again we are generating a unique 64 bit random number when the user >>>> gets to >>>> > the page. We are seeing this same 64 bit unique number being reported >>>> by >>>> > multiple ip addresses. >>>> > >>>> > Since deploying the new schema number with the updated bucket >>>> selection we >>>> > have seen 13 distinct tokens coming from 42 distinct ip addresses. >>>> This >>>> > shouldn't be possible. >>>> > >>>> > mysql:[email protected] [log]> select >>>> count(distinct >>>> > clientIp) from CompletionSugges >>>> > tions_13630018; >>>> > +--------------------------+ >>>> > | count(distinct clientIp) | >>>> > +--------------------------+ >>>> > | 42 | >>>> > +--------------------------+ >>>> > 1 row in set (0.00 sec) >>>> > >>>> > mysql:[email protected] [log]> select >>>> count(distinct >>>> > event_pageViewToken) from CompletionSuggestions_13630018; >>>> > >>>> > +-------------------------------------+ >>>> > | count(distinct event_pageViewToken) | >>>> > +-------------------------------------+ >>>> > | 13 | >>>> > +-------------------------------------+ >>>> > 1 row in set (0.00 sec) >>>> > >>>> > >>>> > >>>> > My best guess at this point is that something has changed in the way >>>> these >>>> > clientIp's are collected and is incorrect. >>>> > >>>> > >>>> > On Mon, Sep 14, 2015 at 1:32 PM, Erik Bernhardson >>>> > <[email protected]> wrote: >>>> >> >>>> >> Thanks for taking a look over this. I've incorperated your >>>> suggestions >>>> >> into a patch[1] and if all looks good will send that out in SWAT. We >>>> should >>>> >> be able to look at the data collected overnight and see if things >>>> are more >>>> >> sane tomorrow. >>>> >> >>>> >> [1] https://gerrit.wikimedia.org/r/#/c/238306/ >>>> >> >>>> >> On Mon, Sep 14, 2015 at 11:56 AM, Gergo Tisza <[email protected]> >>>> >> wrote: >>>> >>> >>>> >>> You are queueing a logging callback every time a request is sent >>>> (which >>>> >>> is roughly every time the user types another character in the >>>> search box) >>>> >>> until the tracking module finishes loading and >>>> mw.searchSuggest.request is >>>> >>> restored. On a slow connection the user might type several >>>> characters and >>>> >>> trigger several log events by then. If you filter for queries from >>>> the same >>>> >>> non-unique IP, you will probably see something like "a", "ab", >>>> "abc"... >>>> >>> >>>> >>> _______________________________________________ >>>> >>> Analytics mailing list >>>> >>> [email protected] >>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>>> >> >>>> > >>>> > >>>> > _______________________________________________ >>>> > Analytics mailing list >>>> > [email protected] >>>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>> > >>>> >>>> >>>> >>>> -- >>>> Oliver Keyes >>>> Count Logula >>>> Wikimedia Foundation >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> >> -- >> Dan Garry >> Lead Product Manager, Discovery >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
