Thank you, Dan, and everyone else who's been involved in fixing this.

Dan

On 15 September 2015 at 19:23, Dan Andreescu <[email protected]>
wrote:

> I confirmed this on IRC, but just feeding the archives here.  I'm also
> convinced that the client IP hashing bug we just found explains this
> problem.  It's good we took a look at the other problems, but the main one
> seems the IP hashing.  We'll brain bounce more tomorrow on how to fix that.
>
> On Tue, Sep 15, 2015 at 6:23 PM, Oliver Keyes <[email protected]>
> wrote:
>
>> Update; I read Dan's thread about hashing, read this thread, and a
>> penny dropped ;).
>>
>> This is totally explainable by the fact that we /expect/ to see
>> multiple pageIDs per IP. And we are! The hashing problem just means
>> those aren't /appearing/ to be the same IP.
>>
>> On 15 September 2015 at 18:05, Erik Bernhardson
>> <[email protected]> wrote:
>> > We've deployed the change to bucketing, but we are still seeing the same
>> > issue in the collected data.
>> >
>> > Again we are generating a unique 64 bit random number when the user
>> gets to
>> > the page. We are seeing this same 64 bit unique number being reported by
>> > multiple ip addresses.
>> >
>> > Since deploying the new schema number with the updated bucket selection
>> we
>> > have seen 13 distinct tokens coming from 42 distinct ip addresses. This
>> > shouldn't be possible.
>> >
>> > mysql:[email protected] [log]> select count(distinct
>> > clientIp) from CompletionSugges
>> > tions_13630018;
>> > +--------------------------+
>> > | count(distinct clientIp) |
>> > +--------------------------+
>> > |                       42 |
>> > +--------------------------+
>> > 1 row in set (0.00 sec)
>> >
>> > mysql:[email protected] [log]> select count(distinct
>> > event_pageViewToken) from CompletionSuggestions_13630018;
>> >
>> > +-------------------------------------+
>> > | count(distinct event_pageViewToken) |
>> > +-------------------------------------+
>> > |                                  13 |
>> > +-------------------------------------+
>> > 1 row in set (0.00 sec)
>> >
>> >
>> >
>> > My best guess at this point is that something has changed in the way
>> these
>> > clientIp's are collected and is incorrect.
>> >
>> >
>> > On Mon, Sep 14, 2015 at 1:32 PM, Erik Bernhardson
>> > <[email protected]> wrote:
>> >>
>> >> Thanks for taking a look over this. I've incorperated your suggestions
>> >> into a patch[1] and if all looks good will send that out in SWAT. We
>> should
>> >> be able to look at the data collected overnight and see if things are
>> more
>> >> sane tomorrow.
>> >>
>> >> [1] https://gerrit.wikimedia.org/r/#/c/238306/
>> >>
>> >> On Mon, Sep 14, 2015 at 11:56 AM, Gergo Tisza <[email protected]>
>> >> wrote:
>> >>>
>> >>> You are queueing a logging callback every time a request is sent
>> (which
>> >>> is roughly every time the user types another character in the search
>> box)
>> >>> until the tracking module finishes loading and
>> mw.searchSuggest.request is
>> >>> restored. On a slow connection the user might type several characters
>> and
>> >>> trigger several log events by then. If you filter for queries from
>> the same
>> >>> non-unique IP, you will probably see something like "a", "ab",
>> "abc"...
>> >>>
>> >>> _______________________________________________
>> >>> Analytics mailing list
>> >>> [email protected]
>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>
>> >>
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to