Thank you, Dan, and everyone else who's been involved in fixing this. Dan
On 15 September 2015 at 19:23, Dan Andreescu <[email protected]> wrote: > I confirmed this on IRC, but just feeding the archives here. I'm also > convinced that the client IP hashing bug we just found explains this > problem. It's good we took a look at the other problems, but the main one > seems the IP hashing. We'll brain bounce more tomorrow on how to fix that. > > On Tue, Sep 15, 2015 at 6:23 PM, Oliver Keyes <[email protected]> > wrote: > >> Update; I read Dan's thread about hashing, read this thread, and a >> penny dropped ;). >> >> This is totally explainable by the fact that we /expect/ to see >> multiple pageIDs per IP. And we are! The hashing problem just means >> those aren't /appearing/ to be the same IP. >> >> On 15 September 2015 at 18:05, Erik Bernhardson >> <[email protected]> wrote: >> > We've deployed the change to bucketing, but we are still seeing the same >> > issue in the collected data. >> > >> > Again we are generating a unique 64 bit random number when the user >> gets to >> > the page. We are seeing this same 64 bit unique number being reported by >> > multiple ip addresses. >> > >> > Since deploying the new schema number with the updated bucket selection >> we >> > have seen 13 distinct tokens coming from 42 distinct ip addresses. This >> > shouldn't be possible. >> > >> > mysql:[email protected] [log]> select count(distinct >> > clientIp) from CompletionSugges >> > tions_13630018; >> > +--------------------------+ >> > | count(distinct clientIp) | >> > +--------------------------+ >> > | 42 | >> > +--------------------------+ >> > 1 row in set (0.00 sec) >> > >> > mysql:[email protected] [log]> select count(distinct >> > event_pageViewToken) from CompletionSuggestions_13630018; >> > >> > +-------------------------------------+ >> > | count(distinct event_pageViewToken) | >> > +-------------------------------------+ >> > | 13 | >> > +-------------------------------------+ >> > 1 row in set (0.00 sec) >> > >> > >> > >> > My best guess at this point is that something has changed in the way >> these >> > clientIp's are collected and is incorrect. >> > >> > >> > On Mon, Sep 14, 2015 at 1:32 PM, Erik Bernhardson >> > <[email protected]> wrote: >> >> >> >> Thanks for taking a look over this. I've incorperated your suggestions >> >> into a patch[1] and if all looks good will send that out in SWAT. We >> should >> >> be able to look at the data collected overnight and see if things are >> more >> >> sane tomorrow. >> >> >> >> [1] https://gerrit.wikimedia.org/r/#/c/238306/ >> >> >> >> On Mon, Sep 14, 2015 at 11:56 AM, Gergo Tisza <[email protected]> >> >> wrote: >> >>> >> >>> You are queueing a logging callback every time a request is sent >> (which >> >>> is roughly every time the user types another character in the search >> box) >> >>> until the tracking module finishes loading and >> mw.searchSuggest.request is >> >>> restored. On a slow connection the user might type several characters >> and >> >>> trigger several log events by then. If you filter for queries from >> the same >> >>> non-unique IP, you will probably see something like "a", "ab", >> "abc"... >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >> >> > >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
